docs/verifier.html - platform/dalvik - Git at Google

 <html>
 <head>
 <title>Dalvik Bytecode Verifier Notes</title>
 </head>

 <body>
 <h1>Dalvik Bytecode Verifier Notes</h1>

 <p>
 The bytecode verifier in the Dalvik VM attempts to provide the same sorts
 of checks and guarantees that other popular virtual machines do.  We
 perform generally the same set of checks as are described in _The Java
 Virtual Machine Specification, Second Edition_, including the updates
 planned for the Third Edition.

 <p>
 Verification can be enabled for all classes, disabled for all, or enabled
 only for "remote" (non-bootstrap) classes.  It should be performed for any
 class that will be processed with the DEX optimizer, and in fact the
 default VM behavior is to only optimize verified classes.


 <h2>Why Verify?</h2>

 <p>
 The verification process adds additional time to the build and to
 the installation of new applications.  It's fairly quick for app-sized
 DEX files, but rather slow for the big "core" and "framework" files.
 Why do it all, when our system relies on UNIX processes for security?
 <p>
 <ol>
     <li>Optimizations.  The interpreter can ignore a lot of potential
     error cases because the verifier guarantees that they are impossible.
     Also, we can optimize the DEX file more aggressively if we start
     with a stronger set of assumptions about the bytecode.
     <li>"Precise" GC.  The work peformed during verification has significant
     overlap with the work required to compute register use maps for
     type-precise GC.
     <li>Intra-application security.  If an app wants to download bits
     of interpreted code over the network and execute them, it can safely
     do so using well-established security mechanisms.
     <li>3rd party app failure analysis.  We have no way to control the
     tools and post-processing utilities that external developers employ,
     so when we get bug reports with a weird exception or native crash
     it's very helpful to start with the assumption that the bytecode
     is valid.
 </ol>
 <p>
 It's also a convenient framework to deal with certain situations, notably
 replacement of instructions that access volatile 64-bit fields with
 more rigorous versions that guarantee atomicity.


 <h2>Verifier Differences</h2>

 <p>
 There are a few checks that the Dalvik bytecode verifier does not perform,
 because they're not relevant.  For example:
 <ul>
     <li>Type restrictions on constant pool references are not enforced,
     because Dalvik does not have a pool of typed constants.  (Dalvik
     uses a simple index into type-specific pools.)
     <li>Verification of the operand stack size is not performed, because
     Dalvik does not have an operand stack.
     <li>Limitations on <code>jsr</code> and <code>ret</code> do not apply,
     because Dalvik doesn't support subroutines.
 </ul>

 In some cases they are implemented differently, e.g.:
 <ul>
     <li>In a conventional VM, backward branches and exceptions are
     forbidden when a local variable holds an uninitialized reference.  The
     restriction was changed to mark registers as invalid when they hold
     references to the uninitialized result of a previous invocation of the
     same <code>new-instance</code> instruction.
     This solves the same problem -- trickery potentially allowing
     uninitialized objects to slip past the verifier -- without unduly
     limiting branches.
 </ul>

 There are also some new ones, such as:
 <ul>
     <li>The <code>move-exception</code> instruction can only appear as
     the first instruction in an exception handler.
     <li>The <code>move-result*</code> instructions can only appear
     immediately after an appropriate <code>invoke-*</code>
     or <code>filled-new-array</code> instruction.
 </ul>

 <p>
 The VM is permitted but not required to enforce "structured locking"
 constraints, which are designed to ensure that, when a method returns, all
 monitors locked by the method have been unlocked an equal number of times.
 This is not currently implemented.

 <p>
 The Dalvik verifier is more restrictive than other VMs in one area:
 type safety on sub-32-bit integer widths.  These additional restrictions
 should make it impossible to, say, pass a value outside the range
 [-128, 127] to a function that takes a <code>byte</code> as an argument.


 <h2>Monitor Verification</h2>

 <p>
 If a method locks an object with a <code>synchronized</code> statement, the
 object must be unlocked before the method returns.  At the bytecode level,
 this means the method must execute a matching <code>monitor-exit</code>
 for every <code>monitor-enter</code> instruction, whether the function
 completes normally or abnormally.  The bytecode verifier optionally
 enforces this.

 <p>
 The verifier uses a fairly simple-minded model.  If you enter a monitor
 held in register N, you can exit the monitor using register N or any
 subsequently-made copies of register N.  The verifier does not attempt
 to identify previously-made copies, track loads and stores through
 fields, or recognize identical constant values (for example, the result
 values from two <code>const-class</code> instructions on the same class
 will be the same reference, but the verifier doesn't recognize this).

 <p>
 Further, you may only exit the monitor most recently entered.  "Hand
 over hand" locking techniques, e.g. "lock A; lock B; unlock A; unlock B",
 are not allowed.

 <p>
 This means that there are a number of situations in which the verifier
 will throw an exception on code that would execute correctly at run time.
 This is not expected to be an issue for compiler-generated bytecode.

 <p>
 For implementation convenience, the maximum nesting depth of
 <code>synchronized</code> statements has been set to 32.  This is not
 a limitation on the recursion count.  The only way to trip this would be
 to have a single method with more than 32 nested <code>synchronized</code>
 statements, something that is unlikely to occur.


 <h2>Verification Failures</h2>

 <p>
 The verifier may reject a class immediately, or it may defer throwing
 an exception until the code is actually used.  For example, if a class
 attempts to perform an illegal access on a field, the VM should throw
 an IllegalAccessError the first time the instruction is encountered.
 On the other hand, if a class contains an invalid bytecode, it should be
 rejected immediately with a VerifyError.

 <p>
 Immediate VerifyErrors are accompanied by detailed, if somewhat cryptic,
 information in the log file.  From this it's possible to determine the
 exact instruction that failed, and the reason for the failure.

 <p>
 It's a bit tricky to implement deferred verification errors in Dalvik.
 A few approaches were considered:

 <ol>
 <li>We could replace the invalid field access instruction with a special
 instruction that generates an illegal access error, and allow class
 verification to complete successfully.  This type of verification must
 be deferred to first class load, rather than be performed ahead of time
 during DEX optimization, because some failures will depend on the current
 execution environment (e.g. not all classes are available at dexopt time).
 At that point the bytecode instructions are mapped read-only during
 verification, so rewriting them isn't possible.
 </li>

 <li>We can perform the access checks when the field/method/class is
 resolved.  In a typical VM implementation we would do the check when the
 entry is resolved in the context of the current classfile, but our DEX
 files combine multiple classfiles together, merging the field/method/class
 resolution results into a single large table.  Once one class successfully
 resolves the field, every other class in the same DEX file would be able
 to access the field.  This is incorrect.
 </li>

 <li>Perform the access checks on every field/method/class access.
 This adds significant overhead.  This is mitigated somewhat by the DEX
 optimizer, which will convert many field/method/class accesses into a
 simpler form after performing the access check.  However, not all accesses
 can be optimized (e.g. accesses to classes unknown at dexopt time),
 and we don't currently have an optimized form of certain instructions
 (notably static field operations).
 </li>
 </ol>

 <p>
 In early versions of Dalvik (as found in Android 1.6 and earlier), the verifier
 simply regarded all problems as immediately fatal.  This generally worked,
 but in some cases the VM was rejecting classes because of bits of code
 that were never used.  The VerifyError itself was sometimes difficult to
 decipher, because it was thrown during verification rather than at the
 point where the problem was first noticed during execution.
 <p>
 The current version uses a variation of approach #1.  The dexopt
 command works the way it did before, leaving the code untouched and
 flagging fully-correct classes as "pre-verified".  When the VM loads a
 class that didn't pass pre-verification, the verifier is invoked.  If a
 "deferrable" problem is detected, a modifiable copy of the instructions
 in the problematic method is made.  In that copy, the troubled instruction
 is replaced with an "always throw" opcode, and verification continues.

 <p>
 In the example used earlier, an attempt to read from an inaccessible
 field would result in the "field get" instruction being replaced by
 "always throw IllegalAccessError on field X".  Creating copies of method
 bodies requires additional heap space, but since this affects very few
 methods overall the memory impact should be minor.

 <p>
 <address>Copyright &copy; 2008 The Android Open Source Project</address>

 </body>
 </html>
	<html>
	<head>
	<title>Dalvik Bytecode Verifier Notes</title>
	</head>

	<body>
	<h1>Dalvik Bytecode Verifier Notes</h1>

	<p>
	The bytecode verifier in the Dalvik VM attempts to provide the same sorts
	of checks and guarantees that other popular virtual machines do. We
	perform generally the same set of checks as are described in _The Java
	Virtual Machine Specification, Second Edition_, including the updates
	planned for the Third Edition.

	<p>
	Verification can be enabled for all classes, disabled for all, or enabled
	only for "remote" (non-bootstrap) classes. It should be performed for any
	class that will be processed with the DEX optimizer, and in fact the
	default VM behavior is to only optimize verified classes.


	<h2>Why Verify?</h2>

	<p>
	The verification process adds additional time to the build and to
	the installation of new applications. It's fairly quick for app-sized
	DEX files, but rather slow for the big "core" and "framework" files.
	Why do it all, when our system relies on UNIX processes for security?
	<p>
	<ol>
	<li>Optimizations. The interpreter can ignore a lot of potential
	error cases because the verifier guarantees that they are impossible.
	Also, we can optimize the DEX file more aggressively if we start
	with a stronger set of assumptions about the bytecode.
	<li>"Precise" GC. The work peformed during verification has significant
	overlap with the work required to compute register use maps for
	type-precise GC.
	<li>Intra-application security. If an app wants to download bits
	of interpreted code over the network and execute them, it can safely
	do so using well-established security mechanisms.
	<li>3rd party app failure analysis. We have no way to control the
	tools and post-processing utilities that external developers employ,
	so when we get bug reports with a weird exception or native crash
	it's very helpful to start with the assumption that the bytecode
	is valid.
	</ol>
	<p>
	It's also a convenient framework to deal with certain situations, notably
	replacement of instructions that access volatile 64-bit fields with
	more rigorous versions that guarantee atomicity.


	<h2>Verifier Differences</h2>

	<p>
	There are a few checks that the Dalvik bytecode verifier does not perform,
	because they're not relevant. For example:
	<ul>
	<li>Type restrictions on constant pool references are not enforced,
	because Dalvik does not have a pool of typed constants. (Dalvik
	uses a simple index into type-specific pools.)
	<li>Verification of the operand stack size is not performed, because
	Dalvik does not have an operand stack.
	<li>Limitations on <code>jsr</code> and <code>ret</code> do not apply,
	because Dalvik doesn't support subroutines.
	</ul>

	In some cases they are implemented differently, e.g.:
	<ul>
	<li>In a conventional VM, backward branches and exceptions are
	forbidden when a local variable holds an uninitialized reference. The
	restriction was changed to mark registers as invalid when they hold
	references to the uninitialized result of a previous invocation of the
	same <code>new-instance</code> instruction.
	This solves the same problem -- trickery potentially allowing
	uninitialized objects to slip past the verifier -- without unduly
	limiting branches.
	</ul>

	There are also some new ones, such as:
	<ul>
	<li>The <code>move-exception</code> instruction can only appear as
	the first instruction in an exception handler.
	<li>The <code>move-result*</code> instructions can only appear
	immediately after an appropriate <code>invoke-*</code>
	or <code>filled-new-array</code> instruction.
	</ul>

	<p>
	The VM is permitted but not required to enforce "structured locking"
	constraints, which are designed to ensure that, when a method returns, all
	monitors locked by the method have been unlocked an equal number of times.
	This is not currently implemented.

	<p>
	The Dalvik verifier is more restrictive than other VMs in one area:
	type safety on sub-32-bit integer widths. These additional restrictions
	should make it impossible to, say, pass a value outside the range
	[-128, 127] to a function that takes a <code>byte</code> as an argument.


	<h2>Monitor Verification</h2>

	<p>
	If a method locks an object with a <code>synchronized</code> statement, the
	object must be unlocked before the method returns. At the bytecode level,
	this means the method must execute a matching <code>monitor-exit</code>
	for every <code>monitor-enter</code> instruction, whether the function
	completes normally or abnormally. The bytecode verifier optionally
	enforces this.

	<p>
	The verifier uses a fairly simple-minded model. If you enter a monitor
	held in register N, you can exit the monitor using register N or any
	subsequently-made copies of register N. The verifier does not attempt
	to identify previously-made copies, track loads and stores through
	fields, or recognize identical constant values (for example, the result
	values from two <code>const-class</code> instructions on the same class
	will be the same reference, but the verifier doesn't recognize this).

	<p>
	Further, you may only exit the monitor most recently entered. "Hand
	over hand" locking techniques, e.g. "lock A; lock B; unlock A; unlock B",
	are not allowed.

	<p>
	This means that there are a number of situations in which the verifier
	will throw an exception on code that would execute correctly at run time.
	This is not expected to be an issue for compiler-generated bytecode.

	<p>
	For implementation convenience, the maximum nesting depth of
	<code>synchronized</code> statements has been set to 32. This is not
	a limitation on the recursion count. The only way to trip this would be
	to have a single method with more than 32 nested <code>synchronized</code>
	statements, something that is unlikely to occur.


	<h2>Verification Failures</h2>

	<p>
	The verifier may reject a class immediately, or it may defer throwing
	an exception until the code is actually used. For example, if a class
	attempts to perform an illegal access on a field, the VM should throw
	an IllegalAccessError the first time the instruction is encountered.
	On the other hand, if a class contains an invalid bytecode, it should be
	rejected immediately with a VerifyError.

	<p>
	Immediate VerifyErrors are accompanied by detailed, if somewhat cryptic,
	information in the log file. From this it's possible to determine the
	exact instruction that failed, and the reason for the failure.

	<p>
	It's a bit tricky to implement deferred verification errors in Dalvik.
	A few approaches were considered:

	<ol>
	<li>We could replace the invalid field access instruction with a special
	instruction that generates an illegal access error, and allow class
	verification to complete successfully. This type of verification must
	be deferred to first class load, rather than be performed ahead of time
	during DEX optimization, because some failures will depend on the current
	execution environment (e.g. not all classes are available at dexopt time).
	At that point the bytecode instructions are mapped read-only during
	verification, so rewriting them isn't possible.
	</li>

	<li>We can perform the access checks when the field/method/class is
	resolved. In a typical VM implementation we would do the check when the
	entry is resolved in the context of the current classfile, but our DEX
	files combine multiple classfiles together, merging the field/method/class
	resolution results into a single large table. Once one class successfully
	resolves the field, every other class in the same DEX file would be able
	to access the field. This is incorrect.
	</li>

	<li>Perform the access checks on every field/method/class access.
	This adds significant overhead. This is mitigated somewhat by the DEX
	optimizer, which will convert many field/method/class accesses into a
	simpler form after performing the access check. However, not all accesses
	can be optimized (e.g. accesses to classes unknown at dexopt time),
	and we don't currently have an optimized form of certain instructions
	(notably static field operations).
	</li>
	</ol>

	<p>
	In early versions of Dalvik (as found in Android 1.6 and earlier), the verifier
	simply regarded all problems as immediately fatal. This generally worked,
	but in some cases the VM was rejecting classes because of bits of code
	that were never used. The VerifyError itself was sometimes difficult to
	decipher, because it was thrown during verification rather than at the
	point where the problem was first noticed during execution.
	<p>
	The current version uses a variation of approach #1. The dexopt
	command works the way it did before, leaving the code untouched and
	flagging fully-correct classes as "pre-verified". When the VM loads a
	class that didn't pass pre-verification, the verifier is invoked. If a
	"deferrable" problem is detected, a modifiable copy of the instructions
	in the problematic method is made. In that copy, the troubled instruction
	is replaced with an "always throw" opcode, and verification continues.

	<p>
	In the example used earlier, an attempt to read from an inaccessible
	field would result in the "field get" instruction being replaced by
	"always throw IllegalAccessError on field X". Creating copies of method
	bodies requires additional heap space, but since this affects very few
	methods overall the memory impact should be minor.

	<p>
	<address>Copyright © 2008 The Android Open Source Project</address>

	</body>
	</html>