Build simple fuzzer - part 3

Welcome to part three of my mini series about fuzzing. I’m glad that so many of you find this content interesting enough to come back for more. This, more than anything encourages me to keep writing.

We’ve ended the last part with a promise that we are going to work on a more intelligent approach to fuzzing. To keep content easily digestible I’ve decided to split the task into two separate articles. In the one you are reading right now we are going to cover the implementation of coverage tracing. Upcoming one will cover the topic of how this information can help to guide the fuzzing process to obtain better results.

We must begin with a bit of theory: “Coverage guided fuzzing (also known as greybox fuzzing) uses program instrumentation to trace the code coverage reached by each input fed to a fuzz target. Fuzzing engines use this information to make informed decisions about which inputs to mutate to maximize coverage.” As explained further in the linked doc this technique is good for deterministic targets with suitable tolerance for unstructured data. A good example of such is a parser for jpeg format. This matches the target we’ve initially selected. Be aware however that random mutations of highly structured inputs like programming languages will most likely fail to produce valid data. Consequently the depth of the fuzz run will suffer in effect.

I must confess that I haven’t done much research how coverage tracing should actually look like and this is one big experiment. So, this is going to be a learning experience for all of us. Well, maybe excluding seasoned authors of fuzzing tools. Oh, one last thing - as always, latest iteration of the code can be found on github.

Measuring performance

Instrumentation inevitable comes with a cost. First victim is performance. I know that one worrying about performance shouldn’t have picked python as a language of choice in the first place. Nevertheless we will try to keep up an appearance as lessons learned here will pay off when we finally rewrite it all to Rust.

Knowing about the sacrifices we are making it is still good to measure how much of the performance we are losing at each step. With that in mind I’ve decided to implement simple status that will tell us roughly how many iterations per second our fuzzer is able to execute. Expecting this to be an easy task I’ve started reading and coding some something simple. Somewhere mid-way, having a custom Threaded class that was spawning other objects at fixed time intervals and accounting for a time drift I caught a glimpse of myself in a window and started pondering about the meaning of life. Well, it wasn’t that dramatic but surely I’ve left ‘simple’ and drifted somewhere towards ‘how do I pass values between threads’. I’ve deleted that code and instead just wrote this:

 start_time = time.time()
 x = counter / (time.time()-start_time)
 print('-> {:.0f} exec/sec'.format(x))

Our naive way to measure the performance is to simply divide the number of iterations we run by the elapsed time. This is not the most sophisticated way of doing it and in the next stage we will have to implement it differently. Especially when we get rid of a fixed number of rounds and switch to continuous fuzzing. For now it will do.

On the topic of performance - if you are really interested how to profile your programs better read the documentation for two amazing python modules - profile and memory_profiler.

Idea 1 - Single stepping