blob: 7f0b4bcd225e47b2cc7abb2a366112d3aeb45936 [file] [log] [blame]
  1. <html>
  2. <head>
  3. <title>Dalvik Optimization and Verification</title>
  4. </head>
  5. <body>
  6. <h1>Dalvik Optimization and Verification With <i>dexopt</i></h1>
  7. <p>
  8. The Dalvik virtual machine was designed specifically for the Android
  9. mobile platform. The target systems have little RAM, store data on slow
  10. internal flash memory, and generally have the performance characteristics
  11. of decade-old desktop systems. They also run Linux, which provides
  12. virtual memory, processes and threads, and UID-based security mechanisms.
  13. <p>
  14. The features and limitations caused us to focus on certain goals:
  15. <ul>
  16. <li>Class data, notably bytecode, must be shared between multiple
  17. processes to minimize total system memory usage.
  18. <li>The overhead in launching a new app must be minimized to keep
  19. the device responsive.
  20. <li>Storing class data in individual files results in a lot of
  21. redundancy, especially with respect to strings. To conserve disk
  22. space we need to factor this out.
  23. <li>Parsing class data fields adds unnecessary overhead during
  24. class loading. Accessing data values (e.g. integers and strings)
  25. directly as C types is better.
  26. <li>Bytecode verification is necessary, but slow, so we want to verify
  27. as much as possible outside app execution.
  28. <li>Bytecode optimization (quickened instructions, method pruning) is
  29. important for speed and battery life.
  30. <li>For security reasons, processes may not edit shared code.
  31. </ul>
  32. <p>
  33. The typical VM implementation uncompresses individual classes from a
  34. compressed archive and stores them on the heap. This implies a separate
  35. copy of each class in every process, and slows application startup because
  36. the code must be uncompressed (or at least read off disk in many small
  37. pieces). On the other hand, having the bytecode on the local heap makes
  38. it easy to rewrite instructions on first use, facilitating a number of
  39. different optimizations.
  40. <p>
  41. The goals led us to make some fundamental decisions:
  42. <ul>
  43. <li>Multiple classes are aggregated into a single "DEX" file.
  44. <li>DEX files are mapped read-only and shared between processes.
  45. <li>Byte ordering and word alignment are adjusted to suit the local
  46. system.
  47. <li>Bytecode verification is mandatory for all classes, but we want
  48. to "pre-verify" whatever we can.
  49. <li>Optimizations that require rewriting bytecode must be done ahead
  50. of time.
  51. </ul>
  52. <p>
  53. The consequences of these decisions are explained in the following sections.
  54. <h2>VM Operation</h2>
  55. <p>
  56. Application code is delivered to the system in a <code>.jar</code>
  57. or <code>.apk</code> file. These are really just <code>.zip</code>
  58. archives with some meta-data files added. The Dalvik DEX data file
  59. is always called <code>classes.dex</code>.
  60. <p>
  61. The bytecode cannot be memory-mapped and executed directly from the zip
  62. file, because the data is compressed and the start of the file is not
  63. guaranteed to be word-aligned. These problems could be addressed by
  64. storing <code>classes.dex</code> without compression and padding out the zip
  65. file, but that would increase the size of the package sent across the
  66. data network.
  67. <p>
  68. We need to extract <code>classes.dex</code> from the zip archive before
  69. we can use it. While we have the file available, we might as well perform
  70. some of the other actions (realignment, optimization, verification) described
  71. earlier. This raises a new question however: who is responsible for doing
  72. this, and where do we keep the output?
  73. <h3>Preparation</h3>
  74. <p>
  75. There are at least three different ways to create a "prepared" DEX file,
  76. sometimes known as "ODEX" (for Optimized DEX):
  77. <ol>
  78. <li>The VM does it "just in time". The output goes into a special
  79. <code>dalvik-cache</code> directory. This works on the desktop and
  80. engineering-only device builds where the permissions on the
  81. <code>dalvik-cache</code> directory are not restricted. On production
  82. devices, this is not allowed.
  83. <li>The system installer does it when an application is first added.
  84. It has the privileges required to write to <code>dalvik-cache</code>.
  85. <li>The build system does it ahead of time. The relevant <code>jar</code>
  86. / <code>apk</code> files are present, but the <code>classes.dex</code>
  87. is stripped out. The optimized DEX is stored next to the original
  88. zip archive, not in <code>dalvik-cache</code>, and is part of the
  89. system image.
  90. </ol>
  91. <p>
  92. The <code>dalvik-cache</code> directory is more accurately
  93. <code>$ANDROID_DATA/data/dalvik-cache</code>. The files inside it have
  94. names derived from the full path of the source DEX. On the device the
  95. directory is owned by <code>system</code> / <code>system</code>
  96. and has 0771 permissions, and the optimized DEX files stored there are
  97. owned by <code>system</code> and the
  98. application's group, with 0644 permissions. DRM-locked applications will
  99. use 640 permissions to prevent other user applications from examining them.
  100. The bottom line is that you can read your own DEX file and those of most
  101. other applications, but you cannot create, modify, or remove them.
  102. <p>
  103. Preparation of the DEX file for the "just in time" and "system installer"
  104. approaches proceeds in three steps:
  105. <p>
  106. First, the dalvik-cache file is created. This must be done in a process
  107. with appropriate privileges, so for the "system installer" case this is
  108. done within <code>installd</code>, which runs as root.
  109. <p>
  110. Second, the <code>classes.dex</code> entry is extracted from the the zip
  111. archive. A small amount of space is left at the start of the file for
  112. the ODEX header.
  113. <p>
  114. Third, the file is memory-mapped for easy access and tweaked for use on
  115. the current system. This includes byte-swapping and structure realigning,
  116. but no meaningful changes to the DEX file. We also do some basic
  117. structure checks, such as ensuring that file offsets and data indices
  118. fall within valid ranges.
  119. <p>
  120. The build system uses a hairy process that involves starting the
  121. emulator, forcing just-in-time optimization of all relevant DEX files,
  122. and then extracting the results from <code>dalvik-cache</code>. The
  123. reasons for doing this, rather than using a tool that runs on the desktop,
  124. will become more apparent when the optimizations are explained.
  125. <p>
  126. Once the code is byte-swapped and aligned, we're ready to go. We append
  127. some pre-computed data, fill in the ODEX header at the start of the file,
  128. and start executing. (The header is filled in last, so that we don't
  129. try to use a partial file.) If we're interested in verification and
  130. optimization, however, we need to insert a step after the initial prep.
  131. <h3>dexopt</h3>
  132. <p>
  133. We want to verify and optimize all of the classes in the DEX file. The
  134. easiest and safest way to do this is to load all of the classes into
  135. the VM and run through them. Anything that fails to load is simply not
  136. verified or optimized. Unfortunately, this can cause allocation of some
  137. resources that are difficult to release (e.g. loading of native shared
  138. libraries), so we don't want to do it in the same virtual machine that
  139. we're running applications in.
  140. <p>
  141. The solution is to invoke a program called <code>dexopt</code>, which
  142. is really just a back door into the VM. It performs an abbreviated VM
  143. initialization, loads zero or more DEX files from the bootstrap class
  144. path, and then sets about verifying and optimizing whatever it can from
  145. the target DEX. On completion, the process exits, freeing all resources.
  146. <p>
  147. It is possible for multiple VMs to want the same DEX file at the same
  148. time. File locking is used to ensure that dexopt is only run once.
  149. <h2>Verification</h2>
  150. <p>
  151. The bytecode verification process involves scanning through the instructions
  152. in every method in every class in a DEX file. The goal is to identify
  153. illegal instruction sequences so that we don't have to check for them at
  154. run time. Many of the computations involved are also necessary for "exact"
  155. garbage collection. See
  156. <a href="verifier.html">Dalvik Bytecode Verifier Notes</a> for more
  157. information.
  158. <p>
  159. For performance reasons, the optimizer (described in the next section)
  160. assumes that the verifier has run successfully, and makes some potentially
  161. unsafe assumptions. By default, Dalvik insists upon verifying all classes,
  162. and only optimizes classes that have been verified. If you want to
  163. disable the verifier, you can use command-line flags to do so. See also
  164. <a href="embedded-vm-control.html"> Controlling the Embedded VM</a>
  165. for instructions on controlling these
  166. features within the Android application framework.
  167. <p>
  168. Reporting of verification failures is a tricky issue. For example,
  169. calling a package-scope method on a class in a different package is
  170. illegal and will be caught by the verifier. We don't necessarily want
  171. to report it during verification though -- we actually want to throw
  172. an exception when the method call is attempted. Checking the access
  173. flags on every method call is expensive though. The
  174. <a href="verifier.html">Dalvik Bytecode Verifier Notes</a> document
  175. addresses this issue.
  176. <p>
  177. Classes that have been verified successfully have a flag set in the ODEX.
  178. They will not be re-verified when loaded. The Linux access permissions
  179. are expected to prevent tampering; if you can get around those, installing
  180. faulty bytecode is far from the easiest line of attack. The ODEX file has
  181. a 32-bit checksum, but that's chiefly present as a quick check for
  182. corrupted data.
  183. <h2>Optimization</h2>
  184. <p>
  185. Virtual machine interpreters typically perform certain optimizations the
  186. first time a piece of code is used. Constant pool references are replaced
  187. with pointers to internal data structures, operations that always succeed
  188. or always work a certain way are replaced with simpler forms. Some of
  189. these require information only available at runtime, others can be inferred
  190. statically when certain assumptions are made.
  191. <p>
  192. The Dalvik optimizer does the following:
  193. <ul>
  194. <li>For virtual method calls, replace the method index with a
  195. vtable index.
  196. <li>For instance field get/put, replace the field index with
  197. a byte offset. Also, merge the boolean / byte / char / short
  198. variants into a single 32-bit form (less code in the interpreter
  199. means more room in the CPU I-cache).
  200. <li>Replace a handful of high-volume calls, like String.length(),
  201. with "inline" replacements. This skips the usual method call
  202. overhead, directly switching from the interpreter to a native
  203. implementation.
  204. <li>Prune empty methods. The simplest example is
  205. <code>Object.&lt;init&gt;</code>, which does nothing, but must be
  206. called whenever any object is allocated. The instruction is
  207. replaced with a new version that acts as a no-op unless a debugger
  208. is attached.
  209. <li>Append pre-computed data. For example, the VM wants to have a
  210. hash table for lookups on class name. Instead of computing this
  211. when the DEX file is loaded, we can compute it now, saving heap
  212. space and computation time in every VM where the DEX is loaded.
  213. </ul>
  214. <p>
  215. All of the instruction modifications involve replacing the opcode with
  216. one not defined by the Dalvik specification. This allows us to freely
  217. mix optimized and unoptimized instructions. The set of optimized
  218. instructions, and their exact representation, is tied closely to the VM
  219. version.
  220. <p>
  221. Most of the optimizations are obvious "wins". The use of raw indices
  222. and offsets not only allows us to execute more quickly, we can also
  223. skip the initial symbolic resolution. Pre-computation eats up
  224. disk space, and so must be done in moderation.
  225. <p>
  226. There are a couple of potential sources of trouble with these
  227. optimizations. First, vtable indices and byte offsets are subject to
  228. change if the VM is updated. Second, if a superclass is in a different
  229. DEX, and that other DEX is updated, we need to ensure that our optimized
  230. indices and offsets are updated as well. A similar but more subtle
  231. problem emerges when user-defined class loaders are employed: the class
  232. we actually call may not be the one we expected to call.
  233. <p>These problems are addressed with dependency lists and some limitations
  234. on what can be optimized.
  235. <h2>Dependencies and Limitations</h2>
  236. <p>
  237. The optimized DEX file includes a list of dependencies on other DEX files,
  238. plus the CRC-32 and modification date from the originating
  239. <code>classes.dex</code> zip file entry. The dependency list includes the
  240. full path to the <code>dalvik-cache</code> file, and the file's SHA-1
  241. signature. The timestamps of files on the device are unreliable and
  242. not used. The dependency area also includes the VM version number.
  243. <p>
  244. An optimized DEX is dependent upon all of the DEX files in the bootstrap
  245. class path. DEX files that are part of the bootstrap class path depend
  246. upon the DEX files that appeared earlier. To ensure that nothing outside
  247. the dependent DEX files is available, <code>dexopt</code> only loads the
  248. bootstrap classes. References to classes in other DEX files fail, which
  249. causes class loading and/or verification to fail, and classes with
  250. external dependencies are simply not optimized.
  251. <p>
  252. This means that splitting code out into many separate DEX files has a
  253. disadvantage: virtual method calls and instance field lookups between
  254. non-boot DEX files can't be optimized. Because verification is pass/fail
  255. with class granularity, no method in a class that has any reliance on
  256. classes in external DEX files can be optimized. This may be a bit
  257. heavy-handed, but it's the only way to guarantee that nothing breaks
  258. when individual pieces are updated.
  259. <p>
  260. Another negative consequence: any change to a bootstrap DEX will result
  261. in rejection of all optimized DEX files. This makes it hard to keep
  262. system updates small.
  263. <p>
  264. Despite our caution, there is still a possibility that a class in a DEX
  265. file loaded by a user-defined class loader could ask for a bootstrap class
  266. (say, String) and be given a different class with the same name. If a
  267. class in the DEX file being processed has the same name as a class in the
  268. bootstrap DEX files, the class will be flagged as ambiguous and references
  269. to it will not be resolved during verification / optimization. The class
  270. linking code in the VM does additional checks to plug another hole;
  271. see the verbose description in the VM sources for details (vm/oo/Class.c).
  272. <p>
  273. If one of the dependencies is updated, we need to re-verify and
  274. re-optimize the DEX file. If we can do a just-in-time <code>dexopt</code>
  275. invocation, this is easy. If we have to rely on the installer daemon, or
  276. the DEX was shipped only in ODEX, then the VM has to reject the DEX.
  277. <p>
  278. The output of <code>dexopt</code> is byte-swapped and struct-aligned
  279. for the host, and contains indices and offsets that are highly VM-specific
  280. (both version-wise and platform-wise). For this reason it's tricky to
  281. write a version of <code>dexopt</code> that runs on the desktop but
  282. generates output suitable for a particular device. The safest way to
  283. invoke it is on the target device, or on an emulator for that device.
  284. <h2>Generated DEX</h2>
  285. <p>
  286. Some languages and frameworks rely on the ability to generate bytecode
  287. and execute it. The rather heavy <code>dexopt</code> verification and
  288. optimization model doesn't work well with that.
  289. <p>
  290. We intend to support this in a future release, but the exact method is
  291. to be determined. We may allow individual classes to be added or whole
  292. DEX files; may allow Java bytecode or Dalvik bytecode in instructions;
  293. may perform the usual set of optimizations, or use a separate interpreter
  294. that performs on-first-use optimizations directly on the bytecode (which
  295. won't be mapped read-only, since it's locally defined).
  296. <address>Copyright &copy; 2008 The Android Open Source Project</address>
  297. </body>
  298. </html>