qemu_mode/README.md - platform/external/AFLplusplus - Git at Google

 # High-performance binary-only instrumentation for afl-fuzz

   (See ../docs/README for the general instruction manual.)

 ## 1) Introduction

 The code in this directory allows you to build a standalone feature that
 leverages the QEMU "user emulation" mode and allows callers to obtain
 instrumentation output for black-box, closed-source binaries. This mechanism
 can be then used by afl-fuzz to stress-test targets that couldn't be built
 with afl-gcc.

 The usual performance cost is 2-5x, which is considerably better than
 seen so far in experiments with tools such as DynamoRIO and PIN.

 The idea and much of the initial implementation comes from Andrew Griffiths.
 The actual implementation on QEMU 3 (shipped with afl++) is from
 Andrea Fioraldi. Special thanks to abiondo that re-enabled TCG chaining.

 ## 2) How to use

 The feature is implemented with a patch to QEMU 3.1.1. The simplest way
 to build it is to run ./build_qemu_support.sh. The script will download,
 configure, and compile the QEMU binary for you.

 QEMU is a big project, so this will take a while, and you may have to
 resolve a couple of dependencies (most notably, you will definitely need
 libtool and glib2-devel).

 Once the binaries are compiled, you can leverage the QEMU tool by calling
 afl-fuzz and all the related utilities with -Q in the command line.

 Note that QEMU requires a generous memory limit to run; somewhere around
 200 MB is a good starting point, but considerably more may be needed for
 more complex programs. The default -m limit will be automatically bumped up
 to 200 MB when specifying -Q to afl-fuzz; be careful when overriding this.

 In principle, if you set CPU_TARGET before calling ./build_qemu_support.sh,
 you should get a build capable of running non-native binaries (say, you
 can try CPU_TARGET=arm). This is also necessary for running 32-bit binaries
 on a 64-bit system (CPU_TARGET=i386). If you're trying to run QEMU on a
 different architecture you can also set HOST to the cross-compiler prefix
 to use (for example HOST=arm-linux-gnueabi to use arm-linux-gnueabi-gcc).

 You can also compile statically-linked binaries by setting STATIC=1. This
 can be useful when compiling QEMU on a different system than the one you're
 planning to run the fuzzer on and is most often used with the HOST variable.

 Note: if you want the QEMU helper to be installed on your system for all
 users, you need to build it before issuing 'make install' in the parent
 directory.

 ## 3) Bonus feature #1: deferred initialization

 As for LLVM mode (refer to its README for mode details) QEMU mode supports
 the deferred initialization.

 This can be enabled setting the environment variable AFL_ENTRYPOINT which allows
 to move the forkserver to a different part, e.g. just before the file is
 opened (e.g. way after command line parsing and config file loading, etc.)
 which can be a huge speed improvement. Note that the specified address
 must be an address of a basic block.

 ## 4) Bonus feature #2: persistent mode

 QEMU mode supports also persistent mode for x86 and x86_64 targets.
 The environment variable to enable it is AFL_QEMU_PERSISTENT_ADDR=`start addr`.
 In this variable you must specify the address of the function that
 has to be the body of the persistent loop.
 The code in this function must be stateless like in the LLVM persistent mode.
 The return address on stack is patched like in WinAFL in order to repeat the
 execution of such function.
 Another modality to execute the persistent loop is to specify also the
 AFL_QEMU_PERSISTENT_RET=`end addr` env variable.
 With this variable assigned, instead of patching the return address, the
 specified instruction is transformed to a jump towards `start addr`.
 Note that the format of the addresses in such variables is hex.

 Note that the base address of PIE binaries in QEMU user mode is 0x4000000000.

 With the env variable AFL_QEMU_PERSISTENT_GPR you can tell QEMU to save the
 original value of general purpose registers and restore them in each cycle.
 This allows to use as persistent loop functions that make use of arguments on
 x86_64.

 With AFL_QEMU_PERSISTENT_RETADDR_OFFSET you can specify the offset from the
 stack pointer in which QEMU can find the return address when `start addr` is
 hitted.

 Use this mode with caution, probably it will not work at the first shot.

 ## 5) Bonus feature #3: CompareCoverage

 CompareCoverage is a sub-instrumentation with effects similar to laf-intel.

 The option that enables QEMU CompareCoverage is AFL_COMPCOV_LEVEL.
 There is also ./libcompcov/ which implements CompareCoverage for *cmp functions
 (splitting memcmp, strncmp, etc. to make these conditions easier solvable by
 afl-fuzz).
 AFL_COMPCOV_LEVEL=1 is to instrument comparisons with only immediate
 values / read-only memory. AFL_COMPCOV_LEVEL=2 instruments all
 comparison instructions and memory comparison functions when libcompcov
 is preloaded. Comparison instructions are currently instrumented only
 on the x86 and x86_64 targets.

 Highly recommended.

 ## 6) Bonus feature #4: Wine mode

 AFL++ QEMU can use Wine to fuzz WIn32 PE binaries. Use the -W flag of afl-fuzz.

 Note that some binaries require user interaction with the GUI and must be patched.

 For examples look [here](https://github.com/andreafioraldi/WineAFLplusplusDEMO).

 ## 7) Notes on linking

 The feature is supported only on Linux. Supporting BSD may amount to porting
 the changes made to linux-user/elfload.c and applying them to
 bsd-user/elfload.c, but I have not looked into this yet.

 The instrumentation follows only the .text section of the first ELF binary
 encountered in the linking process. It does not trace shared libraries. In
 practice, this means two things:

   - Any libraries you want to analyze *must* be linked statically into the
     executed ELF file (this will usually be the case for closed-source
     apps).

   - Standard C libraries and other stuff that is wasteful to instrument
     should be linked dynamically - otherwise, AFL will have no way to avoid
     peeking into them.

 Setting AFL_INST_LIBS=1 can be used to circumvent the .text detection logic
 and instrument every basic block encountered.

 ## 8) Benchmarking

 If you want to compare the performance of the QEMU instrumentation with that of
 afl-gcc compiled code against the same target, you need to build the
 non-instrumented binary with the same optimization flags that are normally
 injected by afl-gcc, and make sure that the bits to be tested are statically
 linked into the binary. A common way to do this would be:

 $ CFLAGS="-O3 -funroll-loops" ./configure --disable-shared
 $ make clean all

 Comparative measurements of execution speed or instrumentation coverage will be
 fairly meaningless if the optimization levels or instrumentation scopes don't
 match.

 ## 9) Gotchas, feedback, bugs

 If you need to fix up checksums or do other cleanup on mutated test cases, see
 experimental/post_library/ for a viable solution.

 Do not mix QEMU mode with ASAN, MSAN, or the likes; QEMU doesn't appreciate
 the "shadow VM" trick employed by the sanitizers and will probably just
 run out of memory.

 Compared to fully-fledged virtualization, the user emulation mode is *NOT* a
 security boundary. The binaries can freely interact with the host OS. If you
 somehow need to fuzz an untrusted binary, put everything in a sandbox first.

 QEMU does not necessarily support all CPU or hardware features that your
 target program may be utilizing. In particular, it does not appear to have
 full support for AVX2 / FMA3. Using binaries for older CPUs, or recompiling them
 with -march=core2, can help.

 Beyond that, this is an early-stage mechanism, so fields reports are welcome.
 You can send them to <afl-users@googlegroups.com>.

 ## 10) Alternatives: static rewriting

 Statically rewriting binaries just once, instead of attempting to translate
 them at run time, can be a faster alternative. That said, static rewriting is
 fraught with peril, because it depends on being able to properly and fully model
 program control flow without actually executing each and every code path.

 The best implementation is this one:

   https://github.com/vanhauser-thc/afl-dyninst

 The issue however is Dyninst which is not rewriting the binaries so that
 they run stable. A lot of crashes happen, especially in C++ programs that
 use throw/catch. Try it first, and if it works for you be happy as it is
 2-3x as fast as qemu_mode.
	# High-performance binary-only instrumentation for afl-fuzz

	(See ../docs/README for the general instruction manual.)

	## 1) Introduction

	The code in this directory allows you to build a standalone feature that
	leverages the QEMU "user emulation" mode and allows callers to obtain
	instrumentation output for black-box, closed-source binaries. This mechanism
	can be then used by afl-fuzz to stress-test targets that couldn't be built
	with afl-gcc.

	The usual performance cost is 2-5x, which is considerably better than
	seen so far in experiments with tools such as DynamoRIO and PIN.

	The idea and much of the initial implementation comes from Andrew Griffiths.
	The actual implementation on QEMU 3 (shipped with afl++) is from
	Andrea Fioraldi. Special thanks to abiondo that re-enabled TCG chaining.

	## 2) How to use

	The feature is implemented with a patch to QEMU 3.1.1. The simplest way
	to build it is to run ./build_qemu_support.sh. The script will download,
	configure, and compile the QEMU binary for you.

	QEMU is a big project, so this will take a while, and you may have to
	resolve a couple of dependencies (most notably, you will definitely need
	libtool and glib2-devel).

	Once the binaries are compiled, you can leverage the QEMU tool by calling
	afl-fuzz and all the related utilities with -Q in the command line.

	Note that QEMU requires a generous memory limit to run; somewhere around
	200 MB is a good starting point, but considerably more may be needed for
	more complex programs. The default -m limit will be automatically bumped up
	to 200 MB when specifying -Q to afl-fuzz; be careful when overriding this.

	In principle, if you set CPU_TARGET before calling ./build_qemu_support.sh,
	you should get a build capable of running non-native binaries (say, you
	can try CPU_TARGET=arm). This is also necessary for running 32-bit binaries
	on a 64-bit system (CPU_TARGET=i386). If you're trying to run QEMU on a
	different architecture you can also set HOST to the cross-compiler prefix
	to use (for example HOST=arm-linux-gnueabi to use arm-linux-gnueabi-gcc).

	You can also compile statically-linked binaries by setting STATIC=1. This
	can be useful when compiling QEMU on a different system than the one you're
	planning to run the fuzzer on and is most often used with the HOST variable.

	Note: if you want the QEMU helper to be installed on your system for all
	users, you need to build it before issuing 'make install' in the parent
	directory.

	## 3) Bonus feature #1: deferred initialization

	As for LLVM mode (refer to its README for mode details) QEMU mode supports
	the deferred initialization.

	This can be enabled setting the environment variable AFL_ENTRYPOINT which allows
	to move the forkserver to a different part, e.g. just before the file is
	opened (e.g. way after command line parsing and config file loading, etc.)
	which can be a huge speed improvement. Note that the specified address
	must be an address of a basic block.

	## 4) Bonus feature #2: persistent mode

	QEMU mode supports also persistent mode for x86 and x86_64 targets.
	The environment variable to enable it is AFL_QEMU_PERSISTENT_ADDR=`start addr`.
	In this variable you must specify the address of the function that
	has to be the body of the persistent loop.
	The code in this function must be stateless like in the LLVM persistent mode.
	The return address on stack is patched like in WinAFL in order to repeat the
	execution of such function.
	Another modality to execute the persistent loop is to specify also the
	AFL_QEMU_PERSISTENT_RET=`end addr` env variable.
	With this variable assigned, instead of patching the return address, the
	specified instruction is transformed to a jump towards `start addr`.
	Note that the format of the addresses in such variables is hex.

	Note that the base address of PIE binaries in QEMU user mode is 0x4000000000.

	With the env variable AFL_QEMU_PERSISTENT_GPR you can tell QEMU to save the
	original value of general purpose registers and restore them in each cycle.
	This allows to use as persistent loop functions that make use of arguments on
	x86_64.

	With AFL_QEMU_PERSISTENT_RETADDR_OFFSET you can specify the offset from the
	stack pointer in which QEMU can find the return address when `start addr` is
	hitted.

	Use this mode with caution, probably it will not work at the first shot.

	## 5) Bonus feature #3: CompareCoverage

	CompareCoverage is a sub-instrumentation with effects similar to laf-intel.

	The option that enables QEMU CompareCoverage is AFL_COMPCOV_LEVEL.
	There is also ./libcompcov/ which implements CompareCoverage for *cmp functions
	(splitting memcmp, strncmp, etc. to make these conditions easier solvable by
	afl-fuzz).
	AFL_COMPCOV_LEVEL=1 is to instrument comparisons with only immediate
	values / read-only memory. AFL_COMPCOV_LEVEL=2 instruments all
	comparison instructions and memory comparison functions when libcompcov
	is preloaded. Comparison instructions are currently instrumented only
	on the x86 and x86_64 targets.

	Highly recommended.

	## 6) Bonus feature #4: Wine mode

	AFL++ QEMU can use Wine to fuzz WIn32 PE binaries. Use the -W flag of afl-fuzz.

	Note that some binaries require user interaction with the GUI and must be patched.

	For examples look [here](https://github.com/andreafioraldi/WineAFLplusplusDEMO).

	## 7) Notes on linking

	The feature is supported only on Linux. Supporting BSD may amount to porting
	the changes made to linux-user/elfload.c and applying them to
	bsd-user/elfload.c, but I have not looked into this yet.

	The instrumentation follows only the .text section of the first ELF binary
	encountered in the linking process. It does not trace shared libraries. In
	practice, this means two things:

	- Any libraries you want to analyze must be linked statically into the
	executed ELF file (this will usually be the case for closed-source
	apps).

	- Standard C libraries and other stuff that is wasteful to instrument
	should be linked dynamically - otherwise, AFL will have no way to avoid
	peeking into them.

	Setting AFL_INST_LIBS=1 can be used to circumvent the .text detection logic
	and instrument every basic block encountered.

	## 8) Benchmarking

	If you want to compare the performance of the QEMU instrumentation with that of
	afl-gcc compiled code against the same target, you need to build the
	non-instrumented binary with the same optimization flags that are normally
	injected by afl-gcc, and make sure that the bits to be tested are statically
	linked into the binary. A common way to do this would be:

	$ CFLAGS="-O3 -funroll-loops" ./configure --disable-shared
	$ make clean all

	Comparative measurements of execution speed or instrumentation coverage will be
	fairly meaningless if the optimization levels or instrumentation scopes don't
	match.

	## 9) Gotchas, feedback, bugs

	If you need to fix up checksums or do other cleanup on mutated test cases, see
	experimental/post_library/ for a viable solution.

	Do not mix QEMU mode with ASAN, MSAN, or the likes; QEMU doesn't appreciate
	the "shadow VM" trick employed by the sanitizers and will probably just
	run out of memory.

	Compared to fully-fledged virtualization, the user emulation mode is NOT a
	security boundary. The binaries can freely interact with the host OS. If you
	somehow need to fuzz an untrusted binary, put everything in a sandbox first.

	QEMU does not necessarily support all CPU or hardware features that your
	target program may be utilizing. In particular, it does not appear to have
	full support for AVX2 / FMA3. Using binaries for older CPUs, or recompiling them
	with -march=core2, can help.

	Beyond that, this is an early-stage mechanism, so fields reports are welcome.
	You can send them to <afl-users@googlegroups.com>.

	## 10) Alternatives: static rewriting

	Statically rewriting binaries just once, instead of attempting to translate
	them at run time, can be a faster alternative. That said, static rewriting is
	fraught with peril, because it depends on being able to properly and fully model
	program control flow without actually executing each and every code path.

	The best implementation is this one:

	https://github.com/vanhauser-thc/afl-dyninst

	The issue however is Dyninst which is not rewriting the binaries so that
	they run stable. A lot of crashes happen, especially in C++ programs that
	use throw/catch. Try it first, and if it works for you be happy as it is
	2-3x as fast as qemu_mode.