| ================================= |
| Tips for performance optimization |
| ================================= |
| |
| This file provides tips for troubleshooting slow or wasteful fuzzing jobs. |
| See README for the general instruction manual. |
| |
| 1) Keep your test cases small |
| ----------------------------- |
| |
| This is probably the single most important step to take! Large test cases do |
| not merely take more time and memory to be parsed by the tested binary, but |
| also make the fuzzing process dramatically less efficient in several other |
| ways. |
| |
| To illustrate, let's say that you're randomly flipping bits in a file, one bit |
| at a time. Let's assume that if you flip bit #47, you will hit a security bug; |
| flipping any other bit just results in an invalid document. |
| |
| Now, if your starting test case is 100 bytes long, you will have a 71% chance of |
| triggering the bug within the first 1,000 execs - not bad! But if the test case |
| is 1 kB long, the probability that we will randomly hit the right pattern in |
| the same timeframe goes down to 11%. And if it has 10 kB of non-essential |
| cruft, the odds plunge to 1%. |
| |
| On top of that, with larger inputs, the binary may be now running 5-10x times |
| slower than before - so the overall drop in fuzzing efficiency may be easily |
| as high as 500x or so. |
| |
| In practice, this means that you shouldn't fuzz image parsers with your |
| vacation photos. Generate a tiny 16x16 picture instead, and run it through |
| jpegtran or pngcrunch for good measure. The same goes for most other types |
| of documents. |
| |
| There's plenty of small starting test cases in ../testcases/* - try them out |
| or submit new ones! |
| |
| If you want to start with a larger, third-party corpus, run afl-cmin with an |
| aggressive timeout on that data set first. |
| |
| 2) Use a simpler target |
| ----------------------- |
| |
| Consider using a simpler target binary in your fuzzing work. For example, for |
| image formats, bundled utilities such as djpeg, readpng, or gifhisto are |
| considerably (10-20x) faster than the convert tool from ImageMagick - all while |
| exercising roughly the same library-level image parsing code. |
| |
| Even if you don't have a lightweight harness for a particular target, remember |
| that you can always use another, related library to generate a corpus that will |
| be then manually fed to a more resource-hungry program later on. |
| |
| 3) Use LLVM instrumentation |
| --------------------------- |
| |
| When fuzzing slow targets, you can gain 2x performance improvement by using |
| the LLVM-based instrumentation mode described in llvm_mode/README.llvm. Note |
| that this mode requires the use of clang and will not work with GCC. |
| |
| The LLVM mode also offers a "persistent", in-process fuzzing mode that can |
| work well for certain types of self-contained libraries, and for fast targets, |
| can offer performance gains up to 5-10x; and a "deferred fork server" mode |
| that can offer huge benefits for programs with high startup overhead. Both |
| modes require you to edit the source code of the fuzzed program, but the |
| changes often amount to just strategically placing a single line or two. |
| |
| If there are important data comparisons performed (e.g. strcmp(ptr, MAGIC_HDR) |
| then using laf-intel (see llvm_mode/README.laf-intel) will help afl-fuzz a lot |
| to get to the important parts in the code. |
| |
| If you are only intested in specific parts of the code being fuzzed, you can |
| whitelist the files that are actually relevant. This improves the speed and |
| accuracy of afl. See llvm_mode/README.whitelist |
| |
| 4) Profile and optimize the binary |
| ---------------------------------- |
| |
| Check for any parameters or settings that obviously improve performance. For |
| example, the djpeg utility that comes with IJG jpeg and libjpeg-turbo can be |
| called with: |
| |
| -dct fast -nosmooth -onepass -dither none -scale 1/4 |
| |
| ...and that will speed things up. There is a corresponding drop in the quality |
| of decoded images, but it's probably not something you care about. |
| |
| In some programs, it is possible to disable output altogether, or at least use |
| an output format that is computationally inexpensive. For example, with image |
| transcoding tools, converting to a BMP file will be a lot faster than to PNG. |
| |
| With some laid-back parsers, enabling "strict" mode (i.e., bailing out after |
| first error) may result in smaller files and improved run time without |
| sacrificing coverage; for example, for sqlite, you may want to specify -bail. |
| |
| If the program is still too slow, you can use strace -tt or an equivalent |
| profiling tool to see if the targeted binary is doing anything silly. |
| Sometimes, you can speed things up simply by specifying /dev/null as the |
| config file, or disabling some compile-time features that aren't really needed |
| for the job (try ./configure --help). One of the notoriously resource-consuming |
| things would be calling other utilities via exec*(), popen(), system(), or |
| equivalent calls; for example, tar can invoke external decompression tools |
| when it decides that the input file is a compressed archive. |
| |
| Some programs may also intentionally call sleep(), usleep(), or nanosleep(); |
| vim is a good example of that. Other programs may attempt fsync() and so on. |
| There are third-party libraries that make it easy to get rid of such code, |
| e.g.: |
| |
| https://launchpad.net/libeatmydata |
| |
| In programs that are slow due to unavoidable initialization overhead, you may |
| want to try the LLVM deferred forkserver mode (see llvm_mode/README.llvm), |
| which can give you speed gains up to 10x, as mentioned above. |
| |
| Last but not least, if you are using ASAN and the performance is unacceptable, |
| consider turning it off for now, and manually examining the generated corpus |
| with an ASAN-enabled binary later on. |
| |
| 5) Instrument just what you need |
| -------------------------------- |
| |
| Instrument just the libraries you actually want to stress-test right now, one |
| at a time. Let the program use system-wide, non-instrumented libraries for |
| any functionality you don't actually want to fuzz. For example, in most |
| cases, it doesn't make to instrument libgmp just because you're testing a |
| crypto app that relies on it for bignum math. |
| |
| Beware of programs that come with oddball third-party libraries bundled with |
| their source code (Spidermonkey is a good example of this). Check ./configure |
| options to use non-instrumented system-wide copies instead. |
| |
| 6) Parallelize your fuzzers |
| --------------------------- |
| |
| The fuzzer is designed to need ~1 core per job. This means that on a, say, |
| 4-core system, you can easily run four parallel fuzzing jobs with relatively |
| little performance hit. For tips on how to do that, see parallel_fuzzing.txt. |
| |
| The afl-gotcpu utility can help you understand if you still have idle CPU |
| capacity on your system. (It won't tell you about memory bandwidth, cache |
| misses, or similar factors, but they are less likely to be a concern.) |
| |
| 7) Keep memory use and timeouts in check |
| ---------------------------------------- |
| |
| If you have increased the -m or -t limits more than truly necessary, consider |
| dialing them back down. |
| |
| For programs that are nominally very fast, but get sluggish for some inputs, |
| you can also try setting -t values that are more punishing than what afl-fuzz |
| dares to use on its own. On fast and idle machines, going down to -t 5 may be |
| a viable plan. |
| |
| The -m parameter is worth looking at, too. Some programs can end up spending |
| a fair amount of time allocating and initializing megabytes of memory when |
| presented with pathological inputs. Low -m values can make them give up sooner |
| and not waste CPU time. |
| |
| 8) Check OS configuration |
| ------------------------- |
| |
| There are several OS-level factors that may affect fuzzing speed: |
| |
| - High system load. Use idle machines where possible. Kill any non-essential |
| CPU hogs (idle browser windows, media players, complex screensavers, etc). |
| |
| - Network filesystems, either used for fuzzer input / output, or accessed by |
| the fuzzed binary to read configuration files (pay special attention to the |
| home directory - many programs search it for dot-files). |
| |
| - On-demand CPU scaling. The Linux 'ondemand' governor performs its analysis |
| on a particular schedule and is known to underestimate the needs of |
| short-lived processes spawned by afl-fuzz (or any other fuzzer). On Linux, |
| this can be fixed with: |
| |
| cd /sys/devices/system/cpu |
| echo performance | tee cpu*/cpufreq/scaling_governor |
| |
| On other systems, the impact of CPU scaling will be different; when fuzzing, |
| use OS-specific tools to find out if all cores are running at full speed. |
| |
| - Transparent huge pages. Some allocators, such as jemalloc, can incur a |
| heavy fuzzing penalty when transparent huge pages (THP) are enabled in the |
| kernel. You can disable this via: |
| |
| echo never > /sys/kernel/mm/transparent_hugepage/enabled |
| |
| - Suboptimal scheduling strategies. The significance of this will vary from |
| one target to another, but on Linux, you may want to make sure that the |
| following options are set: |
| |
| echo 1 >/proc/sys/kernel/sched_child_runs_first |
| echo 1 >/proc/sys/kernel/sched_autogroup_enabled |
| |
| Setting a different scheduling policy for the fuzzer process - say |
| SCHED_RR - can usually speed things up, too, but needs to be done with |
| care. |
| |
| - Use the afl-system-config script to set all proc/sys settings above |
| |
| - Disable all the spectre, meltdown etc. security countermeasures in the |
| kernel if your machine is properly separated: |
| "ibpb=off ibrs=off kpti=off l1tf=off mds=off mitigations=off |
| no_stf_barrier noibpb noibrs nopcid nopti nospec_store_bypass_disable |
| nospectre_v1 nospectre_v2 pcid=off pti=off spec_store_bypass_disable=off |
| spectre_v2=off stf_barrier=off" |
| In most Linux distributions you can put this into a /etc/default/grub |
| variable. |
| |
| 9) If all other options fail, use -d |
| ------------------------------------ |
| |
| For programs that are genuinely slow, in cases where you really can't escape |
| using huge input files, or when you simply want to get quick and dirty results |
| early on, you can always resort to the -d mode. |
| |
| The mode causes afl-fuzz to skip all the deterministic fuzzing steps, which |
| makes output a lot less neat and can ultimately make the testing a bit less |
| in-depth, but it will give you an experience more familiar from other fuzzing |
| tools. |