| <?xml version="1.0" encoding="UTF-8" ?> |
| <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> |
| <html xmlns="http://www.w3.org/1999/xhtml" lang="en"> |
| <head> |
| <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> |
| <link rel="stylesheet" href=".resources/doc.css" charset="UTF-8" type="text/css" /> |
| <link rel="stylesheet" href="../coverage/.resources/prettify.css" charset="UTF-8" type="text/css" /> |
| <link rel="shortcut icon" href=".resources/report.gif" type="image/gif" /> |
| <script type="text/javascript" src="../coverage/.resources/prettify.js"></script> |
| <title>JaCoCo - Control Flow Analysis</title> |
| </head> |
| <body onload="prettyPrint()"> |
| |
| <div class="breadcrumb"> |
| <a href="../index.html" class="el_report">JaCoCo</a> > |
| <a href="index.html" class="el_group">Documentation</a> > |
| <span class="el_source">Control Flow Analysis</span> |
| </div> |
| <div id="content"> |
| |
| <h1>Control Flow Analysis for Java Methods</h1> |
| |
| <p class="hint"> |
| Implementing a coverage tool that supports statement (C0) as well as branch |
| coverage coverage (C1) requires detailed analysis of the internal control flow |
| of Java methods. Due to the architecture of JaCoCo this analysis happens on |
| the bytecode of compiled class files. This document describes JaCoCo's |
| strategies for inserting probes into the control flow at runtime and analyzing |
| the actual code coverage. Marc R. Hoffmann, November 2011 |
| </p> |
| |
| <h2>Control Flow Graphs for Java Bytecode</h2> |
| |
| <p> |
| As an starting point we take the following example method that contains a |
| single branching point: |
| </p> |
| |
| <pre class="source lang-java linenums"> |
| public static void example() { |
| a(); |
| if (cond()) { |
| b(); |
| } else { |
| c(); |
| } |
| d(); |
| } |
| </pre> |
| |
| <p> |
| A Java compiler will create the following bytecode from this example method. |
| Java bytecode is a linear sequence of instructions. Control flow is |
| implemented with <i>jump</i> instructions like the conditional |
| <code>IFEQ</code> or the unconditional <code>GOTO</code> opcode. The jump |
| targets are technically relative offsets to the target instruction. For better |
| readability we use symbolic labels (<code>L1</code>, <code>L2</code>) instead |
| (also the ASM API uses such symbolic labels): |
| </p> |
| |
| <pre class="source linenums"> |
| public static example()V |
| INVOKESTATIC a()V |
| INVOKESTATIC cond()Z |
| IFEQ L1 |
| INVOKESTATIC b()V |
| GOTO L2 |
| L1: INVOKESTATIC c()V |
| L2: INVOKESTATIC d()V |
| RETURN |
| </pre> |
| |
| <p> |
| The possible control flow in the bytecode above can be represented by a graph. |
| The nodes are byte code instruction, the edged of the graph represent the |
| possible control flow between the instructions. The control flow of the |
| example is shown in the left box of this diagram: |
| </p> |
| |
| <img src=".resources/flow-example.png" alt="Bytecode Control Flow"/> |
| |
| |
| <h3>Flow Edges</h3> |
| |
| <p> |
| The control flow graph of a Java method defined by Java byte code may have |
| the following Edges. Each edge connects a source instruction with a target |
| instruction. In some cases the source instruction or the target instruction |
| does not exist (virtual edges for method entry and exit) or cannot be |
| exactly specified (exception handlers). |
| </p> |
| |
| <table class="coverage"> |
| <thead> |
| <tr> |
| <td>Type</td> |
| <td>Source</td> |
| <td>Target</td> |
| <td>Remarks</td> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td>ENTRY</td> |
| <td>-</td> |
| <td>First instruction in method</td> |
| <td></td> |
| </tr> |
| <tr> |
| <td>SEQUENCE</td> |
| <td>Instruction, except <code>GOTO</code>, <code>xRETURN</code>, |
| <code>THROW</code>, <code>TABLESWITCH</code> and <code>LOOKUPSWITCH</code></td> |
| <td>Subsequent instruction</td> |
| <td></td> |
| </tr> |
| <tr> |
| <td>JUMP</td> |
| <td><code>GOTO</code>, <code>IFx</code>, <code>TABLESWITCH</code> or |
| <code>LOOKUPSWITCH</code> instruction</td> |
| <td>Target instruction</td> |
| <td><code>TABLESWITCH</code> and <code>LOOKUPSWITCH</code> will define |
| multiple edges.</td> |
| </tr> |
| <tr> |
| <td>EXHANDLER</td> |
| <td>Any instruction in handler scope</td> |
| <td>Target instruction</td> |
| <td></td> |
| </tr> |
| <tr> |
| <td>EXIT</td> |
| <td><code>xRETURN</code> or <code>THROW</code> instruction</td> |
| <td>-</td> |
| <td></td> |
| </tr> |
| <tr> |
| <td>EXEXIT</td> |
| <td>Any instruction</td> |
| <td>-</td> |
| <td>Unhandled exception.</td> |
| </tr> |
| </tbody> |
| </table> |
| |
| <p> |
| The current JaCoCo implementation ignores edges caused by implicit exceptions |
| and the the method entry. This means we consider SEQUENCE, JUMP, EXIT. |
| </p> |
| |
| |
| <h2>Probe Insertion Strategy</h2> |
| |
| <p> |
| Probes are additional instructions that can be inserted between existing |
| instructions. They do not change the behavior of the method but record the |
| fact that they have been executed. One can think probes are placed on edges of |
| the control flow graph. Theoretically we could insert a probe at every edge of |
| the control flow graph. As a probe implementation itself requires multiple |
| bytecode instructions this would increase the size of the class files several |
| times and significantly slow down execution speed of the instrumented classes. |
| Fortunately this is not required, in fact we only need a few probes per method |
| depending on the control flow of the method. For example a method without any |
| branches requires a single probe only. The reason for this is that starting |
| from a certain probe we can back-trace the execution path and typically get |
| coverage information for multiple instructions. |
| </p> |
| |
| <p> |
| If a probe has been executed we know that the corresponding edge has been |
| visited. From this edge we can conclude to other preceding nodes and edges: |
| </p> |
| |
| <ul> |
| <li>If a edge has been visited, we know that the source node of the this edge |
| has been executed.</li> |
| <li>If a node has been executed and the node is the target of only one edge |
| we know that this edge has been visited.</li> |
| </ul> |
| |
| <p> |
| Recursively applying these rules allows to determine the execution status of |
| all instructions of a method – given that we have probes at the right |
| positions. Therefore JaCoCo inserts probes |
| </p> |
| |
| <ul> |
| <li>at every method exit (return or throws) and</li> |
| <li>at every edge where the target instruction is the target of more than one |
| edge.</li> |
| </ul> |
| |
| <p> |
| We recall that a probe is simply a small sequence of additional instructions |
| that needs to be inserted at a control flow edge. The following table |
| illustrates how this extra instructions are added in case of different edge |
| types. |
| </p> |
| |
| <table class="coverage"> |
| <thead> |
| <tr> |
| <td>Type</td> |
| <td>Before</td> |
| <td>After</td> |
| <td>Remarks</td> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td>SEQUENCE</td> |
| <td><img src=".resources/flow-sequence.png" alt="Sequence"/></td> |
| <td><img src=".resources/flow-sequence-probe.png" alt="Sequence with Probe"/></td> |
| <td> |
| In case of a simple sequence the probe is simply inserted between the |
| two instructions. |
| </td> |
| </tr> |
| <tr> |
| <td>JUMP (unconditional)</td> |
| <td><img src=".resources/flow-goto.png" alt="Unconditional Jump"/></td> |
| <td><img src=".resources/flow-goto-probe.png" alt="Unconditional Jump with Probe"/></td> |
| <td> |
| As an unconditional jump is executed in any case, we can also insert the |
| probe just before the GOTO instruction. |
| </td> |
| </tr> |
| <tr> |
| <td>JUMP (conditional)</td> |
| <td><img src=".resources/flow-cond.png" alt="Conditional Jump"/></td> |
| <td><img src=".resources/flow-cond-probe.png" alt="Conditional Jump with Probe"/></td> |
| <td> |
| Adding a probe to an conditional jump is little bit more tricky. We |
| invert the semantic of the opcode and add the probe right after the |
| conditional jump. With a subsequent <code>GOTO</code> instruction we |
| jump to the original target. Note that this approach will not introduce |
| a backward jump, which would cause trouble with the Java verifier if we |
| have an uninitialized object on the stack. |
| </td> |
| </tr> |
| <tr> |
| <td>EXIT</td> |
| <td><img src=".resources/flow-exit.png" alt="Exit"/></td> |
| <td><img src=".resources/flow-exit-probe.png" alt="Exit with Probe"/></td> |
| <td> |
| As is is the nature of RETURN and THROW statements to actually leave the |
| method we add the probe right before these statements. |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| |
| <p> |
| Now let's see how this rules apply to the example snippet above. We see that |
| <code>INVOKE d()</code> instruction is the only node with more than one |
| incoming edge. So we need to place probes on those edges and another probe on |
| the only exit node. The result is shown the the right box of the diagram |
| above. |
| </p> |
| |
| <h2>Additional Probes Between Lines</h2> |
| |
| <p> |
| The probe insertion strategy described so far does not consider implicit |
| exceptions thrown for example from invoked methods. If the control flow |
| between two probes is interrupted by a exception not explicitly created with |
| a <code>throw</code> statement all instruction in between are considered as |
| not covered. This leads to unexpected results especially when the the block of |
| instructions spans multiple lines of source code. |
| </p> |
| |
| <p> |
| Therefore JaCoCo adds an additional probe between the instructions of two |
| lines whenever the subsequent line contains at least one method invocation. |
| This limits the effect of implicit exceptions from method invocations to |
| single lines of source. The approach only works for class files compiled with |
| debug information (line numbers) and does not consider implicit exceptions |
| from other instructions than method invocations (e.g. |
| <code>NullPointerException</code> or <code>ArrayIndexOutOfBoundsException</code>). |
| </p> |
| |
| <h2>Probe Implementation</h2> |
| |
| <p> |
| Code coverage analysis is a runtime metric that provides execution details |
| of the software under test. This requires detailed recording about the |
| instructions (instruction coverage) that have been executed. For branch |
| coverage also the outcome of decisions has to be recorded. In any case |
| execution data is collected by so called probes: |
| </p> |
| |
| <p class="hint"> |
| A <b>probe</b> is a sequence of bytecode instructions that can be inserted |
| into a Java method. When the probe is executed, this fact is recorded and can |
| be reported by the coverage runtime. The probe must not change the behavior |
| of the original code. |
| </p> |
| |
| <p> |
| The only purpose of the probe is to record that it has been executed at least |
| once. The probe does not record the number of times it has been called or |
| collect any timing information. The latter is out of scope for code coverage |
| analysis and more in the objective of a performance analysis tool. Typically |
| multiple probes needs to be inserted into each method, therefore probes needs |
| to be identified. Also the probe implementation and the storage mechanism it |
| depends on needs to be thread safe as multi-threaded execution is a common |
| scenario for java applications (albeit not for plain unit tests). Probes must |
| not have any side effects on the original code of the method. Also they should |
| add minimal overhead. |
| </p> |
| |
| <p> |
| So to summarize the requirements for execution probes: |
| </p> |
| |
| <ul> |
| <li>Record execution</li> |
| <li>Identification for different probes</li> |
| <li>Thread safe</li> |
| <li>No side effects on application code</li> |
| <li>Minimal runtime overhead</li> |
| </ul> |
| |
| <p> |
| JaCoCo implements probes with a <code>boolean[]</code> array instance per |
| class. Each probe corresponds to a entry in this array. Whenever the probe is |
| executed the entry is set to <code>true</code> with the following four |
| bytecode instructions: |
| </p> |
| |
| <pre class="source"> |
| ALOAD probearray |
| xPUSH probeid |
| ICONST_1 |
| BASTORE |
| </pre> |
| |
| <p> |
| Note that this probe code is thread safe, does not modify the operand stack |
| or modify local variables and is also thread safe. It does also not leave the |
| method though an external call. The only prerequisite is that the probe array |
| is available as a local variable. For this at the beginning of each method |
| additional instrumentation code needs to be added to obtain the array instance |
| associated with the belonging class. To avoid code duplication the |
| initialization is delegated to a static private method |
| <code>$jacocoinit()</code> which is added to every non-interface class. |
| </p> |
| |
| <p> |
| The size of the probe code above depends on the position of the probe array |
| variable and the value of the probe identifier as different opcodes can be |
| used. As calculated in the table below the overhead per probe ranges between 4 |
| and 7 bytes of additional bytecode: |
| </p> |
| |
| <table class="coverage"> |
| <thead> |
| <tr> |
| <td>Possible Opcodes</td> |
| <td>Min. Size [bytes]</td> |
| <td>Max. Size [bytes]</td> |
| </tr> |
| </thead> |
| <tfoot> |
| <tr> |
| <td>Total:</td> |
| <td>4</td> |
| <td>7</td> |
| </tr> |
| </tfoot> |
| <tbody> |
| <tr> |
| <td><code>ALOAD_x</code>, <code>ALOAD</code> <sup>1</sup></td> |
| <td>1</td> |
| <td>2</td> |
| </tr> |
| <tr> |
| <td><code>ICONST_x</code>, <code>BIPUSH</code>, <code>SIPUSH</code>, <code>LDC</code>, <code>LDC_W</code> <sup>2</sup></td> |
| <td>1</td> |
| <td>3</td> |
| </tr> |
| <tr> |
| <td><code>ICONST_1</code></td> |
| <td>1</td> |
| <td>1</td> |
| </tr> |
| <tr> |
| <td><code>BASTORE</code></td> |
| <td>1</td> |
| <td>1</td> |
| </tr> |
| </tbody> |
| </table> |
| |
| <p> |
| <sup>1</sup> The probe array is the first variable after the arguments. |
| If the method arguments do not consume more that 3 slots the 1-byte opcode can |
| be used.<br/> |
| <sup>2</sup> 1-byte opcodes for ids 0 to 5, 2-byte opcode for ids up to 127, |
| 3-byte opcode for ids up to 32767. Ids values of 32768 or more require an |
| additional constant pool entry. For normal class files it is very unlikely to |
| require more than 32,000 probes. |
| </p> |
| |
| <h2>Performance</h2> |
| |
| <p> |
| The control flow analysis and probe insertion strategy described in this |
| document allows to efficiently record instruction and branch coverage. In |
| total classes instrumented with JaCoCo increase their size by about 30%. Due |
| to the fact that probe execution does not require any method calls, only local |
| instructions, the observed execution time overhead for instrumented |
| applications typically is less than 10%. |
| </p> |
| |
| <h2>References</h2> |
| |
| <ul> |
| <li><a href="http://asm.objectweb.org/">ASM byte code library</a> by Eric Bruneton at al.</li> |
| <li><a href="http://andrei.gmxhome.de/bytecode/index.html">Bytecode Outline Plug-In</a> by Andrei Loskutov</li> |
| <li><a href="http://en.wikipedia.org/wiki/Glossary_of_graph_theory">Wikipedia: Glossary of Graph Theory</a></li> |
| </ul> |
| |
| </div> |
| <div class="footer"> |
| <div class="right"><a href="@jacoco.home.url@">JaCoCo</a> @qualified.bundle.version@</div> |
| <a href="license.html">Copyright</a> © @copyright.years@ Mountainminds GmbH & Co. KG and Contributors |
| </div> |
| |
| </body> |
| </html> |