Fuzzing Basics: Random Testing, Compiler Bugs, and Coverage Guidance

Name: Fuzzing Programs to Find Bugs - Computerphile
Uploaded: 2026-06-04T13:30:15+00:00
Duration: 19 min 13 s
Channel: Computerphile
Description: Summary and key takeaways on Fuzzing Programs to Find Bugs - Computerphile — Summary, covering to Fuzzing Fuzzing finds bugs by feeding software inputs that

Computerphile

Jun 04, 2026

•

19 min video

•

2 min read

YouTube video ID: kaD54VXxGrI

Source: YouTube video by Computerphile — Watch original video

PDF

Fuzzing finds bugs by feeding software inputs that programmers never anticipated. Unexpected inputs cause crashes, memory corruption, or security flaws, especially in open systems such as web browsers that must safely process arbitrary HTML and JavaScript. By exposing these hidden failure modes, fuzzing helps prevent remote‑execution attacks and other vulnerabilities.

Fuzzing for Compilers

Compilers can crash, but a more insidious problem is “miso‑compilation,” where the generated code behaves incorrectly while the compiler appears to run without error. Detecting miso‑compilation requires deterministic programs that avoid undefined behavior (UB). The typical workflow generates a random, well‑formed program, compiles it with two different compilers—commonly GCC and Clang—and runs the resulting executables on the same input. If the outputs differ, at least one compiler has introduced a semantic error. Projects such as Csmith, originating from the University of Utah, automate the creation of UB‑free C programs for this purpose.

Coverage‑Guided Fuzzing

Pure random testing often stalls at shallow code paths, missing logic hidden behind specific command‑line arguments or input formats. Coverage‑guided fuzzing treats the test corpus as an evolving population. An input from the corpus is mutated—through bit flips, splicing, or deletions—and executed against the target. When a mutation reaches previously unvisited code, the new input is added to the corpus, becoming a seed for further mutation. This evolutionary loop—population, mutation, fitness (new coverage)—runs thousands of times per second, gradually exploring deeper program states. Tools such as AFL (American fuzzy lop) and LibFuzzer embody this approach.

Software Reliability

Testing demonstrates the presence of bugs; it does not prove their absence. High‑assurance, safety‑critical software relies on formal verification to claim bug‑free behavior, but most software remains vulnerable to undiscovered defects. Developers must prioritize bugs based on context and impact, recognizing that not all flaws pose equal risk.

Mechanisms in Practice

Miso‑compilation detection
1. Generate a random, deterministic program.
2. Compile with Compiler A and Compiler B.
3. Run both executables on identical input.
4. Compare outputs; any discrepancy signals a compiler bug.

Coverage‑guided evolution
1. Maintain a corpus of “interesting” inputs.
2. Select an input and apply random mutations.
3. Execute the mutated input against the system.
4. If new code paths are reached, add the input to the corpus.
5. Repeat continuously to explore deeper system states.

These mechanisms illustrate how fuzzing transforms blind random testing into a systematic, feedback‑driven discovery process.

Takeaways

Fuzzing uncovers hidden bugs by supplying software with unexpected inputs, a practice essential for securing open systems like web browsers.
Miso‑compilation bugs arise when compilers translate code incorrectly, and they can be detected by comparing deterministic program outputs from multiple compilers.
Coverage‑guided fuzzing evolves a corpus of inputs through mutation and feedback, enabling exploration of deep code paths that random testing misses.
Testing proves the existence of bugs but cannot guarantee their absence; high‑assurance software therefore relies on formal verification.
Tools such as Csmith, AFL, and LibFuzzer automate deterministic program generation and coverage‑guided evolution, making fuzzing scalable and effective.

Frequently Asked Questions

How does coverage‑guided fuzzing discover deeper code paths?

Coverage‑guided fuzzing maintains a corpus of inputs that have exercised unique program regions. It mutates these inputs and runs the program; when a mutation triggers previously unseen code, the input is added to the corpus. This feedback loop repeatedly expands coverage, allowing the fuzzer to reach deeper logic that plain random testing cannot.

Who is Computerphile on YouTube?

Computerphile is a YouTube channel that publishes videos on a range of topics. Browse more summaries from this channel below.

Does this page include the full transcript of the video?

Yes, the full transcript for this video is available on this page. Click 'Show transcript' in the sidebar to read it.

Helpful resources related to this video

If you want to practice or explore the concepts discussed in the video, these commonly used tools may help.

The Art Of Software Testing Book Recommended

This classic text provides foundational knowledge on testing methodologies, complementing the lecture's focus on finding software bugs.

Amazon →

Fuzzing For Software Security Testing Book

A specialized book that covers the practical implementation of fuzzing techniques discussed in the lecture.

Amazon →

Introduction To Algorithms Textbook

Provides the mathematical and algorithmic background necessary to understand the evolutionary algorithms and coverage-guided logic used in fuzzers.

Amazon →

Links may be affiliate links. We only include resources that are genuinely relevant to the topic.

Summarize another video

Full Transcript YouTube

I thought today we could have a chat
about randomized testing, also known as
fuzzing, which is a really effective
technique for finding bugs in software.
Where fuzzing began is actually slightly
disputed. It's one of those techniques
which is pretty obvious. You have some
software and when we design software as
programmers, we normally think about
what the software is supposed to do, how
people should use it, and maybe the kind
of mistakes they would naturally make
when using it. We don't often think
about absolutely crazy ways to use the
software. So, for example, if you had a
a word processing application, you would
think about the layout. You think about
the features it should support. And you
might think about things like the user
trying to open a file that doesn't
exist, but you not might not necessarily
consider cases where they make a
document where they paste in all sorts
of crazy content or you might not think
about them trying to open say a video
file with the word processor and what
would happen there. So often software
goes wrong because of unexpected inputs
that the programmer didn't think about
and randomized testing is a great way of
finding those kinds of unexpected
inputs. Fuzzing has become more and more
important in the internet age because if
you think about systems like web
browser, Chrome, Safari, Firefox used by
billions of users around the world.
These are completely open systems with a
fishing attack where you trick someone
to click on some web page takes them to
you know a random web page a malicious
web page. That web page can have
arbitrary content. So the web browser
needs to be able to be robust against
completely arbitrary HTML. It doesn't
necessarily need to render anything
meaningful for the HTML, but it must not
be the case that random HTML or
JavaScript can expose a vulnerability in
the web browser. And with recent news
about things like claude mythos, you
know, the large language model that
there's quite some concern about when
people talk about vulnerabilities, it's
things like the kind of inputs that
could cause an open system like a web
browser to not just misbehave but
actually provide the entry point for say
a remote execution attack. That's the
kind of thing people are concerned
about. And fuzzing is a really good way
to defend against those kinds of
problems because by testing a system
with random inputs, you may be able to
accidentally stumble upon some of these
back doors. There are really two
extremes of what people want to do with
fuzzing. On one extreme, you want to
randomly test software using garbled
malicious inputs that should not really
be fed to that software with the aim of
trying to find vulnerabilities in the
software. Then at the other end of the
spectrum, you might want to test
software with very highly crafted,
well-formed inputs, but super
complicated inputs and actually try to
see whether the software did the right
thing on those kinds of inputs. And
researchers are interested in advanced
techniques at both ends of the spectrum.
So perhaps I'll start by talking about
the ways you might use fuzzing, which is
also known as randomized testing for
testing complicated systems where you
want to check they really did the right
thing. So a really nice example here is
compilers. Compilers like GCC or clang
for C or Java C for Java. And there are
many many programming languages with
multiple compilers often per language.
And a compiler like any program can
crash if you give it some kind of
expected or unexpected input. There may
be a bug that simply causes the compiler
to crash. But with a compiler, there's a
worse kind of bug which is known as a
miso compilation, which is where you
give the compiler some source code and
the compiler uncomplainingly turns that
into binary. It turns it into executable
object code and it doesn't indicate that
anything has gone wrong in the process.
However, the translation from the source
code into the executable binary code has
not represented the meaning of the
program. The semantics of the source
program haven't been debated. So as a
unrealistically simple example, if the
program said x plus y and the compiler
translated that to into a multiply
instruction that would multiply x by y,
that would be a miso compilation. Now
we'd hope that a compiler never made
such a drastic error as that, but
compilers do really sophisticated
optimizations when translating code from
source code into assembly. And these
optimizations can have edge cases the
compiler developer didn't think about.
And they may not work in all settings.
They should, but they may not. So, a
really interesting line of work using
fuzzing to test compilers
involves creating a fuzzer, a random
generator that randomly generates
programs in a given programming language
that should actually be well-formed
programs. They wouldn't do anything
meaningful. They would just use the
features of the language in all kinds of
weird and wonderful ways.
They're not not to produce any desired
effect, but so that when you would run
one of these randomly generated
programs, it would produce some output.
Perhaps it would just print an integer.
What the integer would be would entirely
depend on the random structure of the
program. But the challenge is to
generate programs that when you would
run them, you would get a deterministic
integer result. Then what you can do is
you can say if you've got two compilers
under test for argument sake I'm going
to say GCC the the GNU C compiler and
clang another very well-known open-
source C and C++ compiler if you have
got your fuzzer which is a piece of
software that's going to generate
programs then the fuzzer is going to
generate some program so what's in this
program is going to be randomly
determined by the fuzzer However, the
program is going to be an entirely
deterministic program.
We can give this program to GCC. We can
give it to Clang
and they will compile this program into
executables. Let me call them 1.exe
and 2.exe.
Now if we run these executables on the
same input. So let's say we give them
some input I and we give them both the
same input I
and this program gives us an output 01
and this program gives us an output 02.
Then these outputs should be equal. The
fuzzer has generated one program. We've
given that program to both compilers.
We've compiled the program to get to
executables. These executables might be
really different from one another
because the compilers might have made
different decisions on how to optimize
the code. So they will have different
ways of generating a binary
representation of the source code. But
those binary representations should
encode the same thing. They should both
encode the semantics of this program. If
we then run them on the same input, we
should expect to get the same output.
And if we don't, then that indicates
that something's gone wrong in the
compilation process. There must be a bug
in one or other of these compilers. Now,
there are some important caveats here
for your viewers who know about C and
C++. These languages are full of
undefined behavior. And if this program
was subject to undefined behavior, it
might be that actually these executables
could be expected to produce different
results because of the undefined
behavior. So the real challenge is that
we must make sure there is no
undefined behavior, no UB in this
program.
>> What's an example of undefined behavior
then
>> in C? An example of undefined behavior
is dividing by zero.
>> So often that will cause the program to
crash, but actually that's not
guaranteed. The language doesn't say
that a division by zero causes the
program to crash. It's actually
undefined. Another example is reading
from an array out of bounds. So an array
overflow or an array underflow and there
are actually a huge number of examples
of undefined behavior in the language.
That's an example of randomized testing
applied in the domain of compilers and
there is a tool that originated from the
University of Utah called Cmith because
it's a a smith of C programs and cmith
is the fuzzer that creates these
randomized C programs and the key
innovation in that semith project is
exactly producing programs that don't
have this undefined behavior. Okay,
>> so this is an example of fuzzing to try
to find deep functional bugs in a
complicated piece of software, a
compiler. It's not trying to find
security vulnerabilities where when you
feed the compiler gibberish, the
compiler crashes with an exploitable
vulnerability. That's not the purpose of
this fussing. It's really to try to find
deep problems where the compiler is
actually wrongly optimizing a program.
There's a genuine bug in the program.
>> Okay. And do and do you get to the point
where you can categorically prove a
problem with it or is it a bit like any
science where you know you you you might
test it a thousand times and they all
seem fine and and so therefore you you
still can't prove that it's fine because
the thousand one might be a
>> that's right. So generally software
testing of any kind is about showing the
presence of bugs not proving the absence
of bugs.
>> So you can't use testing to show that
software is correct. You can only use it
to show the software is not correct. But
the process of repeatedly showing that
the software is not correct, fixing it,
and then it becoming harder and harder
to find bugs in the software gives
evidence the software is becoming more
and more reliable. Now, a very difficult
problem that some academics are
interested in.
I think it's interesting, but it's not
something I worry about is how do you
know that you found all the bugs? I
guess my practical perspective from my
time working in companies is that you
never do. You never find all the bugs
unless you're working in high assurance
safety critical software where you're
really using formal verification to
prove that a software is correct. If you
are not working in that domain, then all
software has bugs.
>> Yeah.
>> And then a very relevant question is
which bugs do you care about? So do you
care about all these bugs fuzzers can
find? So fuzzers sometimes find problems
that are bugs but whether anyone would
ever fall file of one of these bugs is
up for question and that in itself is
quite an interesting issue.
>> The thing about any of this is that
obviously with the sort of systems we
use now I mean I'm I'm going to call to
mind that XKCD the Nebraska
>> the Nebraska one with a tiny leg you
know you can look at what you're looking
at but it could be sitting on top of so
many other possible kind of frameworks
and platforms.
>> That's right. So these techniques need
to be deployed with a context in mind.
So I think you know this this work on
these Unix utilities the idea there is
really these utilities although they're
kind of some of them are very simple
tools huge numbers of people are using
them and you don't really know what
they're using them for all kinds of
things. So there's a really good case
for trying to make those super reliable
compilers we know that many many
companies depend on compilers like GCC
and clang they're some of the best
examples of open source projects we
have. So there's a very strong argument
for making them be as bug-free as
possible. That said, does it really
matter if GCC falls over when you give
it a file full of non-printable
characters? If it doesn't compile that
if it just crashes with a fatal error as
opposed to saying, you know, syntax
error, does that really matter? Is that
a bug that actually matters? I would
argue no. So, you know, even something
as critical as GCC, not every bug in GCC
is important. And I want to talk about
coverage guided fuzzing which was
popularized by a tool called AFL
American fuzzy lop which was from a
developer at Google some time ago that
has really then paved the way for a huge
amount of practical work and also
research into fuzzing. Let's say we've
got a system that we're trying to test
sut system under test. If we think about
the functionality of the system being a
bit like a maze, there are all sorts of,
you know, places that you could get into
in this system. Then if you're firing
some kind of random input at this
system, some random bites, then it may
be that this gets
into
some parts of the software and then you
fire another input and it gets into
some other part of the system and then
another input gets into
a slightly different part of the system.
But it might be that actually for an
input to get any further into the
system, it might have to have certain
features. For example, it might be that
you've got to be giving some valid
command line arguments to a Linux
utility to unlock a lot of its behavior.
So many utilities when you run them, you
provide a load of arguments on the
command line and if you provide a
certain argument, then the input will be
treated in a particular way. So there
may be whole portions of the system
you're never going to reach unless you
provide that correct argument. So um
this approach while it might find some
pathological cases where a system
crashes, it's unlikely to find inputs
that drive the system into really deep
and interesting states. A technique that
overcomes that problem is known as
coverageguided fuzzing and it was
popularized by a tool called AFL
American fuzzy lop which was developed
by Google engineer some time ago and has
led to lots of follow-on practical work
in industry and also lots of academic
research work and various other fuzzers
like Google's lib fuzzer for example
which is an AFL style fuzzer. Once
again, if we think about our system
under test being mazeike, the idea of
coverage guided fuzzing is that we have
got a corpus of inputs that are being
used during the testing process.
Initially, this corpus might have some
example real world inputs for the
system. So, for example, if the system
was a PDF engine, we might have a
selection of example PDFs that already
get some coverage of this system. And
the idea is to try to find inputs that
go deeper into this system, get further
into the maze with the hope of finding
bugs. So let's suppose that there is a
bug lurking somewhere in this maze. The
idea is we don't know where this bug is,
but by exploring the maze, we're hoping
we will find one or more of these bugs
that can then be fixed to make the
system more reliable. So the idea of
coverage guided fuzzing is let's say
this corpus contains a number of inputs
but maybe initially
it contains some input a and here I
don't mean literally the character a I'm
just calling a the input a might be a
pdf for example then what the fuzz is
going to do is it's going to pull out
this input a from the corpus and then it
is going to randomly mutate a to give a
mutated input, let's call it a prime.
And the mutations that are done to go
from an input A to A prime are randomly
applied. And they can include things
like flipping a bit or replacing a
sequence of bytes with another sequence
of bytes or deleting some bytes from the
input or taking two inputs and splicing
them together for example. So now this a
prime is going to be fed to the system
under test and a prime will get some
coverage of the system which we can
think about as being making some
progress through this maze. So if this
progress through the maze that a prime
made means that we get to bits of the
maze i.e. bits of the software that have
never been reached before during fuzzing
then this input a prime is deemed to be
an interesting input that might be
useful for further consideration. So, it
is put back into the corpus. Let's call
it a prime. I'm being a bit inconsistent
with my colors here.
>> That's okay. We got it. We got it.
>> Okay. So, uh now let's suppose that
there was another input in this corpus.
Let's call it B and the fuzzer pulled B
out and applied a mutation to it to get
B prime.
And then B prime is fed to the SUT, the
system under test. Let's say that B
prime actually just retreads some of the
path that A prime took. So it achieves
some of the coverage, maybe all of the
coverage that A prime achieved, but no
more coverage. Then B prime is not
deemed to be interesting. It didn't get
any further. We didn't see anything new
from B prime. So it's not going to be
put back into the corpus for further
mutation. But then let's say that a
prime is picked out again by the fuzzer
and mutated into a form a prime prime
say and when we give this to the fuzzer
it actually gets a bit further into the
system under test then that's great. A
prime prime is getting places that we've
never seen before. It's interesting and
it gets put back into the corpus to be
considered in the future. And the idea
is to do this process at speed ideally
thousands or tens of thousands of time
per second and mutate inputs guided by
coverage. So new coverage is interesting
um to make them more and more
sophisticated to get deeper into the
system. And it might be if we're lucky
perhaps the third time we mutate this
input. So we go from a prime prime let's
call it a prime prime prime. When we
give this to the system, it might be
that now this gets even more coverage
and bang hits this bug. The system
crashes. That's excellent. We found an
input a prime prime that causes the
system to crash. And now a developer can
go and investigate what's wrong and fix
the bug.
>> So that's coverage guided fuzzing. And
we can see this as an example of an
evolutionary algorithm. We've got a
population, the corpus, and we've got
the notion of randomly mutating inputs.
So random mutations in the population. I
mentioned also that one of the mutations
involves taking multiple inputs and
splicing them together, mating inputs to
form new inputs. And we've got a fitness
function. So these inputs only survive.
They only get put back in the corpus if
they are fit, which means they get new
coverage. So we're this fuzzing approach
can be seen as evolving a corpus of
inputs into richer and richer inputs
that get further into the system under
test to trigger bugs. Let's say that
we've got some utility that takes
command line options. The example dash
dash only. The probability of having an
input that's got a dash at the beginning
is quite high. There aren't that many
characters. And after many iterations,
we're probably going to find a dash. And
there'll be code that processes command
line inputs. And that code will get a
bit deeper if we do provide a dash. But
then if the next character isn't a dash,
then we won't have dash dash, which is
the beginning of an input. But because
we found one dash, that will go back
into the corpus. And in the future, it's
fairly likely that a mutation will add a
second dash to this. And then in the
future, it's quite likely we'll start to
get the character O, for example, and
then N, etc. So, after many, many, many
iterations, but not an astronomical
number of iterations of this process,
the fuzzer will
implicitly start to discover the
expected format of inputs and get deeper
into the system.
Now if I flip this around and if all of
the second thread was executed before
all of the first thread, I will have the
symmetric situation where a is one and b
is zero. The question is could I also
have observe a is zero and b is zero.

Help & FAQ

Is it Possible to Block Childrens' Access to Social Media? - Computerphile

Computerphile

Jun 16, 2026

Fuzzing for Compilers

Coverage‑Guided Fuzzing

Software Reliability

Mechanisms in Practice

Takeaways

Frequently Asked Questions

How does coverage‑guided fuzzing discover deeper code paths?

Who is Computerphile on YouTube?

Does this page include the full transcript of the video?

Helpful resources related to this video

Share This Summary

Embed This Summary