| <html> |
| <head> |
| <title>Dalvik Porting Guide</title> |
| </head> |
| |
| <body> |
| <h1>Dalvik Porting Guide</h1> |
| |
| <p> |
| The Dalvik virtual machine is intended to run on a variety of platforms. |
| The baseline system is expected to be a variant of UNIX (Linux, BSD, Mac |
| OS X) running the GNU C compiler. Little-endian CPUs have been exercised |
| the most heavily, but big-endian systems are explicitly supported. |
| </p><p> |
| There are two general categories of work: porting to a Linux system |
| with a previously unseen CPU architecture, and porting to a different |
| operating system. This document covers the former. |
| </p><p> |
| Basic familiarity with the Android platform, source code structure, and |
| build system is assumed. |
| </p> |
| |
| |
| <h2>Core Libraries</h2> |
| |
| <p> |
| The native code in the core libraries (chiefly <code>libcore</code>, |
| but also <code>dalvik/vm/native</code>) is written in C/C++ and is expected |
| to work without modification in a Linux environment. |
| </p><p> |
| The core libraries pull in code from many other projects, including |
| OpenSSL, zlib, and ICU. These will also need to be ported before the VM |
| can be used. |
| </p> |
| |
| |
| <h2>JNI Call Bridge</h2> |
| |
| <p> |
| Most of the Dalvik VM runtime is written in portable C. The one |
| non-portable component of the runtime is the JNI call bridge. Simply put, |
| this converts an array of integers into function arguments of various |
| types, and calls a function. This must be done according to the C calling |
| conventions for the platform. The task could be as simple as pushing all |
| of the arguments onto the stack, or involve complex rules for register |
| assignment and stack alignment. |
| </p><p> |
| To ease porting to new platforms, the <a href="http://sourceware.org/libffi/"> |
| open-source FFI library</a> (Foreign Function Interface) is used when a |
| custom bridge is unavailable. FFI is not as fast as a native implementation, |
| and the optional performance improvements it does offer are not used, so |
| writing a replacement is a good first step. |
| </p><p> |
| The code lives in <code>dalvik/vm/arch/*</code>, with the FFI-based version |
| in the "generic" directory. There are two source files for each architecture. |
| One defines the call bridge itself: |
| </p><p><blockquote> |
| <code>void dvmPlatformInvoke(void* pEnv, ClassObject* clazz, int argInfo, |
| int argc, const u4* argv, const char* signature, void* func, |
| JValue* pReturn)</code> |
| </blockquote></p><p> |
| This will invoke a C/C++ function declared: |
| </p><p><blockquote> |
| <code>return_type func(JNIEnv* pEnv, Object* this [, <i>args</i>])<br></code> |
| </blockquote>or (for a "static" method):<blockquote> |
| <code>return_type func(JNIEnv* pEnv, ClassObject* clazz [, <i>args</i>])</code> |
| </blockquote></p><p> |
| The role of <code>dvmPlatformInvoke</code> is to convert the values in |
| <code>argv</code> into C-style calling conventions, call the method, and |
| then place the return type into <code>pReturn</code> (a union that holds |
| all of the basic JNI types). The code may use the method signature |
| (a DEX "shorty" signature, with one character for the return type and one |
| per argument) to determine how to handle the values. |
| </p><p> |
| The other source file involved here defines a 32-bit "hint". The hint |
| is computed when the method's class is loaded, and passed in as the |
| "argInfo" argument. The hint can be used to avoid scanning the ASCII |
| method signature for things like the return value, total argument size, |
| or inter-argument 64-bit alignment restrictions. |
| |
| |
| <h2>Interpreter</h2> |
| |
| <p> |
| The Dalvik runtime includes two interpreters, labeled "portable" and "fast". |
| The portable interpreter is largely contained within a single C function, |
| and should compile on any system that supports gcc. (If you don't have gcc, |
| you may need to disable the "threaded" execution model, which relies on |
| gcc's "goto table" implementation; look for the THREADED_INTERP define.) |
| </p><p> |
| The fast interpreter uses hand-coded assembly fragments. If none are |
| available for the current architecture, the build system will create an |
| interpreter out of C "stubs". The resulting "all stubs" interpreter is |
| quite a bit slower than the portable interpreter, making "fast" something |
| of a misnomer. |
| </p><p> |
| The fast interpreter is enabled by default. On platforms without native |
| support, you may want to switch to the portable interpreter. This can |
| be controlled with the <code>dalvik.vm.execution-mode</code> system |
| property. For example, if you: |
| </p><p><blockquote> |
| <code>adb shell "echo dalvik.vm.execution-mode = int:portable >> /data/local.prop"</code> |
| </blockquote></p><p> |
| and reboot, the Android app framework will start the VM with the portable |
| interpreter enabled. |
| </p> |
| |
| |
| <h3>Mterp Interpreter Structure</h3> |
| |
| <p> |
| There may be significant performance advantages to rewriting the |
| interpreter core in assembly language, using architecture-specific |
| optimizations. In Dalvik this can be done one instruction at a time. |
| </p><p> |
| The simplest way to implement an interpreter is to have a large "switch" |
| statement. After each instruction is handled, the interpreter returns to |
| the top of the loop, fetches the next instruction, and jumps to the |
| appropriate label. |
| </p><p> |
| An improvement on this is called "threaded" execution. The instruction |
| fetch and dispatch are included at the end of every instruction handler. |
| This makes the interpreter a little larger overall, but you get to avoid |
| the (potentially expensive) branch back to the top of the switch statement. |
| </p><p> |
| Dalvik mterp goes one step further, using a computed goto instead of a goto |
| table. Instead of looking up the address in a table, which requires an |
| extra memory fetch on every instruction, mterp multiplies the opcode number |
| by a fixed value. By default, each handler is allowed 64 bytes of space. |
| </p><p> |
| Not all handlers fit in 64 bytes. Those that don't can have subroutines |
| or simply continue on to additional code outside the basic space. Some of |
| this is handled automatically by Dalvik, but there's no portable way to detect |
| overflow of a 64-byte handler until the VM starts executing. |
| </p><p> |
| The choice of 64 bytes is somewhat arbitrary, but has worked out well for |
| ARM and x86. |
| </p><p> |
| In the course of development it's useful to have C and assembly |
| implementations of each handler, and be able to flip back and forth |
| between them when hunting problems down. In mterp this is relatively |
| straightforward. You can always see the files being fed to the compiler |
| and assembler for your platform by looking in the |
| <code>dalvik/vm/mterp/out</code> directory. |
| </p><p> |
| The interpreter sources live in <code>dalvik/vm/mterp</code>. If you |
| haven't yet, you should read <code>dalvik/vm/mterp/README.txt</code> now. |
| </p> |
| |
| |
| <h3>Getting Started With Mterp</h3> |
| |
| </p><p> |
| Getting started: |
| <ol> |
| <li>Decide on the name of your architecture. For the sake of discussion, |
| let's call it <code>myarch</code>. |
| <li>Make a copy of <code>dalvik/vm/mterp/config-allstubs</code> to |
| <code>dalvik/vm/mterp/config-myarch</code>. |
| <li>Create a <code>dalvik/vm/mterp/myarch</code> directory to hold your |
| source files. |
| <li>Add <code>myarch</code> to the list in |
| <code>dalvik/vm/mterp/rebuild.sh</code>. |
| <li>Make sure <code>dalvik/vm/Android.mk</code> will find the files for |
| your architecture. If <code>$(TARGET_ARCH)</code> is configured this |
| will happen automatically. |
| <li>Disable the Dalvik JIT. You can do this in the general device |
| configuration, or by editing the initialization of WITH_JIT in |
| <code>dalvik/vm/Dvm.mk</code> to always be <code>false</code>. |
| </ol> |
| </p><p> |
| You now have the basic framework in place. Whenever you make a change, you |
| need to perform two steps: regenerate the mterp output, and build the |
| core VM library. (It's two steps because we didn't want the build system |
| to require Python 2.5. Which, incidentally, you need to have.) |
| <ol> |
| <li>In the <code>dalvik/vm/mterp</code> directory, regenerate the contents |
| of the files in <code>dalvik/vm/mterp/out</code> by executing |
| <code>./rebuild.sh</code>. Note there are two files, one in C and one |
| in assembly. |
| <li>In the <code>dalvik</code> directory, regenerate the |
| <code>libdvm.so</code> library with <code>mm</code>. You can also use |
| <code>mmm dalvik/vm</code> from the top of the tree. |
| </ol> |
| </p><p> |
| This will leave you with an updated libdvm.so, which can be pushed out to |
| a device with <code>adb sync</code> or <code>adb push</code>. If you're |
| using the emulator, you need to add <code>make snod</code> (System image, |
| NO Dependency check) to rebuild the system image file. You should not |
| need to do a top-level "make" and rebuild the dependent binaries. |
| </p><p> |
| At this point you have an "all stubs" interpreter. You can see how it |
| works by examining <code>dalvik/vm/mterp/cstubs/entry.c</code>. The |
| code runs in a loop, pulling out the next opcode, and invoking the |
| handler through a function pointer. Each handler takes a "glue" argument |
| that contains all of the useful state. |
| </p><p> |
| Your goal is to replace the entry method, exit method, and each individual |
| instruction with custom implementations. The first thing you need to do |
| is create an entry function that calls the handler for the first instruction. |
| After that, the instructions chain together, so you don't need a loop. |
| (Look at the ARM or x86 implementation to see how they work.) |
| </p><p> |
| Once you have that, you need something to jump to. You can't branch |
| directly to the C stub because it's expecting to be called with a "glue" |
| argument and then return. We need a C stub "wrapper" that does the |
| setup and jumps directly to the next handler. We write this in assembly |
| and then add it to the config file definition. |
| </p><p> |
| To see how this works, create a file called |
| <code>dalvik/vm/mterp/myarch/stub.S</code> that contains one line: |
| <pre> |
| /* stub for ${opcode} */ |
| </pre> |
| Then, in <code>dalvik/vm/mterp/config-myarch</code>, add this below the |
| <code>handler-size</code> directive: |
| <pre> |
| # source for the instruction table stub |
| asm-stub myarch/stub.S |
| </pre> |
| </p><p> |
| Regenerate the sources with <code>./rebuild.sh</code>, and take a look |
| inside <code>dalvik/vm/mterp/out/InterpAsm-myarch.S</code>. You should |
| see 256 copies of the stub function in a single large block after the |
| <code>dvmAsmInstructionStart</code> label. The <code>stub.S</code> |
| code will be used anywhere you don't provide an assembly implementation. |
| </p><p> |
| Note that each block begins with a <code>.balign 64</code> directive. |
| This is what pads each handler out to 64 bytes. Note also that the |
| <code>${opcode}</code> text changed into an opcode name, which should |
| be used to call the C implementation (<code>dvmMterp_${opcode}</code>). |
| </p><p> |
| The actual contents of <code>stub.S</code> are up to you to define. |
| See <code>entry.S</code> and <code>stub.S</code> in the <code>armv5te</code> |
| or <code>x86</code> directories for working examples. |
| </p><p> |
| If you're working on a variation of an existing architecture, you may be |
| able to use most of the existing code and just provide replacements for |
| a few instructions. Look at the <code>vm/mterp/config-*</code> files |
| for examples. |
| </p> |
| |
| |
| <h3>Replacing Stubs</h3> |
| |
| <p> |
| There are roughly 250 Dalvik opcodes, including some that are inserted by |
| <a href="dexopt.html">dexopt</a> and aren't described in the |
| <a href="dalvik-bytecode.html">Dalvik bytecode</a> documentation. Each |
| one must perform the appropriate actions, fetch the next opcode, and |
| branch to the next handler. The actions performed by the assembly version |
| must exactly match those performed by the C version (in |
| <code>dalvik/vm/mterp/c/OP_*</code>). |
| </p><p> |
| It is possible to customize the set of "optimized" instructions for your |
| platform. This is possible because optimized DEX files are not expected |
| to work on multiple devices. Adding, removing, or redefining instructions |
| is beyond the scope of this document, and for simplicity it's best to stick |
| with the basic set defined by the portable interpreter. |
| </p><p> |
| Once you have written a handler that looks like it should work, add |
| it to the config file. For example, suppose we have a working version |
| of <code>OP_NOP</code>. For demonstration purposes, fake it for now by |
| putting this into <code>dalvik/vm/mterp/myarch/OP_NOP.S</code>: |
| <pre> |
| /* This is my NOP handler */ |
| </pre> |
| </p><p> |
| Then, in the <code>op-start</code> section of <code>config-myarch</code>, add: |
| <pre> |
| op OP_NOP myarch |
| </pre> |
| </p><p> |
| This tells the generation script to use the assembly version from the |
| <code>myarch</code> directory instead of the C version from the <code>c</code> |
| directory. |
| </p><p> |
| Execute <code>./rebuild.sh</code>. Look at <code>InterpAsm-myarch.S</code> |
| and <code>InterpC-myarch.c</code> in the <code>out</code> directory. You |
| will see that the <code>OP_NOP</code> stub wrapper has been replaced with our |
| new code in the assembly file, and the C stub implementation is no longer |
| included. |
| </p><p> |
| As you implement instructions, the C version and corresponding stub wrapper |
| will disappear from the output files. Eventually you will have a 100% |
| assembly interpreter. You may find it saves a little time to examine |
| the output of your compiler for some of the operations. The |
| <a href="porting-proto.c.txt">porting-proto.c</a> sample code can be |
| helpful here. |
| </p> |
| |
| |
| <h3>Interpreter Switching</h3> |
| |
| <p> |
| The Dalvik VM actually includes a third interpreter implementation: the debug |
| interpreter. This is a variation of the portable interpreter that includes |
| support for debugging and profiling. |
| </p><p> |
| When a debugger attaches, or a profiling feature is enabled, the VM |
| will switch interpreters at a convenient point. This is done at the |
| same time as the GC safe point check: on a backward branch, a method |
| return, or an exception throw. Similarly, when the debugger detaches |
| or profiling is discontinued, execution transfers back to the "fast" or |
| "portable" interpreter. |
| </p><p> |
| Your entry function needs to test the "entryPoint" value in the "glue" |
| pointer to determine where execution should begin. Your exit function |
| will need to return a boolean that indicates whether the interpreter is |
| exiting (because we reached the "bottom" of a thread stack) or wants to |
| switch to the other implementation. |
| </p><p> |
| See the <code>entry.S</code> file in <code>x86</code> or <code>armv5te</code> |
| for examples. |
| </p> |
| |
| |
| <h3>Testing</h3> |
| |
| <p> |
| A number of VM tests can be found in <code>dalvik/tests</code>. The most |
| useful during interpreter development is <code>003-omnibus-opcodes</code>, |
| which tests many different instructions. |
| </p><p> |
| The basic invocation is: |
| <pre> |
| $ cd dalvik/tests |
| $ ./run-test 003 |
| </pre> |
| </p><p> |
| This will run test 003 on an attached device or emulator. You can run |
| the test against your desktop VM by specifying <code>--reference</code> |
| if you suspect the test may be faulty. You can also use |
| <code>--portable</code> and <code>--fast</code> to explictly specify |
| one Dalvik interpreter or the other. |
| </p><p> |
| Some instructions are replaced by <code>dexopt</code>, notably when |
| "quickening" field accesses and method invocations. To ensure |
| that you are testing the basic form of the instruction, add the |
| <code>--no-optimize</code> option. |
| </p><p> |
| There is no in-built instruction tracing mechanism. If you want |
| to know for sure that your implementation of an opcode handler |
| is being used, the easiest approach is to insert a "printf" |
| call. For an example, look at <code>common_squeak</code> in |
| <code>dalvik/vm/mterp/armv5te/footer.S</code>. |
| </p><p> |
| At some point you need to ensure that debuggers and profiling work with |
| your interpreter. The easiest way to do this is to simply connect a |
| debugger or toggle profiling. (A future test suite may include some |
| tests for this.) |
| </p> |
| |
| |
| <h2>Other Performance Issues</h2> |
| |
| <p> |
| The <code>System.arraycopy()</code> function is heavily used. The |
| implementation relies on the bionic C library to provide a fast, |
| platform-optimized data copy function for arrays with elements wider |
| than one byte. If you're not using bionic, or your platform does not |
| have an implementation of this method, Dalvik will use correct but |
| sub-optimal algorithms instead. For best performance you will want |
| to provide your own version. |
| </p><p> |
| See the comments in <code>dalvik/vm/native/java_lang_System.c</code> |
| for details. |
| </p> |
| |
| <p> |
| <address>Copyright © 2009 The Android Open Source Project</address> |
| |
| </body> |
| </html> |