annotation-file-utilities/design.tex - platform/external/annotation-tools - Git at Google

 \documentclass{article}
 \title{AFU v4 Design}
 \author{Dan Brown}

 \usepackage{graphics}
 \usepackage{verbatim}
 \newenvironment{code}{\footnotesize\verbatim}{\endverbatim\normalsize}
 \newenvironment{jcode}{\footnotesize\verbatim}{\endverbatim\normalsize}

 \begin{document}
 \maketitle

 \section{Design Considerations}

 The Annotation File Utilities (AFU) include programs for extracting
 annotations from Java class files and inserting annotations into Java
 source or class files, in accordance with the Java 8 language and virtual
 machine specification.  This
 document uses AFU in the singular, in reference to the source
 insertion program except where otherwise noted.

 The AFU suffers from multiple design flaws that impede performance and
 maintainability, including:

 \begin{description}
 \item[Unstructured text modification.]  The AFU inserts
 annotations into source text instead of modifying syntax trees.
 The use of source positions rather than tree paths for indicating
 insertion placement has led to code that is fragile and hence difficult
 to maintain.
 \item[Non-uniform logic.]  There are two distinct ways to
 identify where to insert annotations into the source, each having its
 own distinct data structures and logic.  Consequently, it is possible to
 have inconsistent results for different representations of the same
 location.
 \item[Inefficient traversal.]  The search strategy is
 generate-and-test: it chooses an AST node in a preorder traversal and
 checks every insertion specification relevant to the class (or package)
 to determine whether it applies to the current node.  Furthermore, since
 the implementations of some tests contain recursive calls going up the
 path to the root, these tests can be repeated many times over for some
 nodes.
 \end{description}

 The proposed design is aimed at mitigating these problems.  It makes
 the traversal a single pass over the annotations, with limited
 re-visitation of AST nodes.  It reduces the chance for inconsistency
 between formats by finding the AST node that corresponds to a code
 location given in the scene before determining the insertion position.
 It modifies an AST data structure rather than the source itself,
 preserving formatting and commentary by incorporating non-leaf text
 (i.e.\ whitespace and comments) into the representation of the AST.

 \section{Current AFU}

 The current AFU source code (3.6.16 as of 11 June 2015) is organized
 into four packages (not counting the \texttt{asmx} and
 \texttt{scene\_lib} sub-projects):
 \begin{itemize}
 \item  \texttt{annotator}, containing the main class and a class
 representing source text;
 \item  \texttt{annotator.find},\ containing the classes that implement
 the core insertion logic;
 \item  \texttt{annotator.scanner},\ containing support classes for
 \texttt{annotator.find};\ and
 \item  \texttt{annotator.specification},\ containing classes that
 manage intermediate representations of the insertions specified in a
 Java Annotation Index File (JAIF).
 \end{itemize}
 Should the proposed redesign be implemented, it is anticipated that the
 first will be simplified somewhat and modified to reflect changes in the
 other packages, the second will be radically simplified, the third will
 remain the same, and the fourth will be eliminated.

 The details of \texttt{annotator.scanner}\ are not discussed here, since
 the proposed redesign will neither change them nor be influenced by
 them.  Of the classes in \texttt{annotator.specification},\ only
 \texttt{IndexFileSpecification},\ which extracts \texttt{Insertion}s
 from \texttt{scene-lib}'s JAIF representation
 (\texttt{annotations.el.AScene}), is relevant to the redesign.

 By far the largest and most complex package is \texttt{annotator.find}.
 The insertion logic is distributed among \texttt{TreeFinder} and
 numerous implementations of the \texttt{Criterion} and
 \texttt{Insertion} interfaces.  (A \texttt{Criterion} is a reified
 condition that provides a method
 \texttt{boolean isSatisfiedBy(TreePath path)},
 and an \texttt{Insertion} includes matching criteria, output parameters,
 and methods for generating the source text to be inserted.)

 A simplified outline of the basic flow of the program is as follows:

 \begin{enumerate}
 \item  Interpret each JAIF as a ``scene'' (\texttt{AScene}), then invoke
 the \texttt{parse} method in \texttt{IndexFileSpecification} to extract
 the indicated \texttt{Insertion}s from the scene.
 \item  Parse each Java source file to obtain a (type-checked) AST.
 \item  Create an empty map from (Integer) source positions to
 insertions.
 \item  For each node in the AST in pre-order:
 \begin{itemize}
 \item  For each extracted \texttt{Insertion} that is relevant to the
 immediately enclosing class (or package for compilation unit nodes):
 \begin{itemize}
 \item  If the path from the root to the current node satisfies every
 \texttt{Criterion} for the \texttt{Insertion}:
 \begin{enumerate}
 \item  Find the appropriate source position for the annotation.
 \item  Add a mapping from the position to the \texttt{Insertion} to the
 map.
 \item  Remove the insertion from the extracted set.
 \end{enumerate}
 \end{itemize}
 \end{itemize}
 \item  For each position-to-insertion mapping in descending order by
 position:
 \begin{itemize}
 \item  Generate insertion text and insert at indicated position.
 \end{itemize}
 \item  Emit modified source code.
 \end{enumerate}

 \section{Proposed Revisions}

 \begin{figure}[ht]
 \begin{center}
 \resizebox{\columnwidth}{!}{\includegraphics{dataflow.png}}
 \end{center}
 \caption{
 Data flow between components.  Boxes represent sources and sinks;
 ellipses represent Java objects.  Bold lines mark new classes and data
 paths, and dashed lines indicate current data flows to be bypassed.
 }
 \label{fig:dataflow}
 \end{figure}

 Figure~\ref{fig:dataflow} gives an overview of proposed changes to the
 flow of data between classes.  First, the current re-interpretation of
 the scene as a ``specification'' and subsequent extraction of
 \texttt{Criteria} will be eliminated, as the necessary information can
 be determined directly from the \texttt{Scene}.  Second, the source text
 will be incorporated into the augmented AST in a more convenient form,
 so there will no need to refer back to the source itself.

 The division of responsibilities among classes will remain largely the
 same, with the major differences corresponding to the changes in the
 data flow.  First, the annotation locations will be read from the
 \texttt{Scene} and mapped to the corresponding path in the (augmented)
 AST (if any).  Second, since \texttt{AAST} will incorporate the salient
 capabilities of \texttt{JCTree} and \texttt{Source}, it will cover the
 responsibilities of both classes.

 Many of the current supporting classes will be retained as they are.  In
 particular, the new AFU will continue to rely on javac's AST classes and
 parser, and \texttt{scene-lib} probably will need little or no
 modification.  The \texttt{annotator.find} package will be radically
 simplified with the elimination of the \texttt{Criterion} and
 (possibly) \texttt{Insertion} hierarchies, as the insertion location
 within the AST can be determined directly from the \texttt{Scene}.

 The redesign can be broken down into two independent parts:
 \begin{description}
 \item{Path-based positioning.} Eliminate the \texttt{Criterion}
 hierarchy; determine the insertion location and the text to be inserted
 during (a single) scene traversal; let the scene traversal rather than
 an AST traversal be the outer loop.  Advantages: since insertion is
 based on the node rather than a numeric position, there is only one
 positioning logic (even if there are still two matching objects); an
 entire pass and a large amount of intermediate data are eliminated, thus
 improving time and space efficiency.
 \item{Tree transformation.}  Eliminate the \texttt{Insertion} hierarchy;
 instead of calculating numeric positions, apply a transformation to the
 appropriate subtree.  Advantage: the logic of tree transformation is
 both closer to the specification and easier to comprehend (and maintain)
 than the logic of finding numeric positions for inserting text by
 reverse order.
 \end{description}

 While these have no obvious technical disadvantages with respect to the
 existing AFU, the payoff must be weighed against the costs.  The current
 codebase is mostly stable at this point, so any increase in reliability
 will be small (though for verification software, a jump from, say,
 99.99\% to 99.9999\% correct makes an enormous difference).
 Fortunately, the AFU's extensive test suite will make it clear how
 quickly a revised implementation achieves the same level of reliability.

 \subsection{Path-based Positioning}

 The current AFU creates transforms the scene into a set of
 \texttt{Insertions}, each with its own \texttt{Criteria} (a collection
 of objects of various \texttt{Criterion} subclasses).  In doing so, it
 throws away information about hierarchical relationships that could be
 used to guide the search for matching locations in the code.

 \begin{figure}[ht]
 \begin{center}
 \resizebox{156pt}{!}{\includegraphics{corresp.png}}
 \end{center}
 \caption{
 Correspondence between scene and AST.
 }
 \label{fig:corresp}
 \end{figure}

 It makes more sense to traverse an AST in parallel (conceptually, not
 necessarily by different processors) with a scene that refers to it.
 For example, if a class specification (\texttt{AClass}) in a scene
 corresponds to a class node in an AST, each time one of the method
 specifications (\texttt{AMethod}) is visited, the corresponding method
 node (see Figure~\ref{fig:corresp}) should be visited as well to see
 whether any insertions under the \texttt{AMethod} apply to it.  This
 strategy not only obviates an additional pass but avoids much useless
 test repetition, as there is no need to test \emph{every} potential
 insertion against \emph{every} AST node encountered before the correct
 insertion site.  With this change, it becomes possible to generate
 \texttt{Insertion}s on the fly---or perhaps even to go ahead and insert
 text, though managing order dependencies would take some additional
 work.

 The main loop of the program thus will become a pre-order traversal of
 \texttt{AScene} rather than of \texttt{JCTree}, although the scene
 traversal will explore corresponding parts of the AST.  In other words:
 the process will be guided by the annotations to be inserted rather than
 by the shape of the AST.  Hence, each specified annotation will be
 visited exactly once.  The information currently used to create
 \texttt{Criteria} for the insertions can instead be used to directly
 determine the path (if any) in the AST to which the current scene
 element corresponds.

 \subsection{Tree Transformation}

 The current AFU finds numeric source positions in trees and inserts
 annotations directly into the source code.  The position-finding logic
 is therefore applied at a time other than insertion, introducing
 insertion order dependencies and making it harder to find the source of
 a bug.

 A new class hierarchy, the ``Augmented AST'' (\texttt{AAST}), will
 directly manage non-leaf text--that is, inter-leaf text (whitespace and
 comments) and intra-branch text (implicit or contained in branch node
 data members).  It will be an implementation of
 \texttt{com.sun.source.Tree} that provides needed bits from
 \texttt{com.sun.tools.javac.tree.JCTree} and defines a parallel subclass
 hierarchy.  AAST and subclasses provide methods to retrieve the
 inter-leaf text segments that precede and follow an AST node and
 printing methods that preserve the formatting of the source code.

 Non-leaf text will be stored in three types of locations in an AAST:
 \begin{enumerate}
 \item  Each leaf node will store any non-code text that immediately
 follows.
 \item  For branch nodes, for every implicit text segment or data member
 that is represented in the source code but not in a descendant node, the
 node will store any whitespace and comments that follow, unless the
 content is at the end of the node's text.
 \item  The computation unit node will store any text that precedes the
 beginning of the Java code.
 \end{enumerate}
 In this way, all text segment references occur exactly once in the AST,
 and annotations can be inserted at the front with no need to take
 preceding text into account.

 An annotation insertion will consist of a transformation of a tree node.
 If no additional code is to be generated, the transformation is simple:
 \begin{itemize}
 \item  For type annotations, the parent AST node of the element to be
 annotated adds the optional annotation node if it is absent, then adds
 the annotation to the annotation node.
 \item  For declaration annotations, the annotation is added onto the
 front of a declaration's list of modifiers.
 \end{itemize}
 Cases that involve code generation are more complex.
 Figures~\ref{fig:nocast} and \ref{fig:typecast} give an example (pre-
 and post-insertion, respectively) of a transformation that requires a
 typecast to be generated and inserted.  Note that previously existing
 non-leaf text segments remain unaltered.

 \begin{figure}[ht]
 \begin{center}
 \resizebox{261pt}{!}{\includegraphics{nocast.png}}
 \end{center}
 \caption{
 Augmented subgraph for \texttt{int i = 0}.  Text of leaf nodes and text
 representing elements of branch nodes are connected with solid arrows,
 and inter-leaf and inter-element text segments are connected with dashed
 arrows.  The text of the subgraph can be read from the lower frontier of
 the graph.  (Underscores represent spaces here.)
 }
 \label{fig:nocast}
 \end{figure}

 \begin{figure}[ht]
 \begin{center}
 \resizebox{\columnwidth}{!}{\includegraphics{typecast.png}}
 \end{center}
 \caption{
 Augmented subgraph for \texttt{int i = (@A int) 0}, the result of adding
 an annotated cast \texttt{(@A int)} to the previous figure's code.  With
 respect to the graph in the previous figure, \texttt{TypeCast} has taken
 \texttt{Literal}'s place in the tree, and \texttt{Literal} has become a
 child of \texttt{TypeCast} along with the newly inserted material.
 }
 \label{fig:typecast}
 \end{figure}

 The other major change will be in the output generation process, which
 will depend on the \texttt{AAST} rather than directly on the source.
 The \texttt{AAST}'s pretty-printer will perform an in-order traversal
 and emit the text of each leaf interleaved with the non-leaf text.

 \subsection{Algorithm Summary}

 \begin{enumerate}
 \item  Interpret each JAIF as a ``scene'' (\texttt{AScene}).
 \item  Parse each Java source file to obtain a (type-checked) AST.
 \item  For each AST:
 \begin{enumerate}
 \item  Transform the AST into an AAST.
 \item  For each annotated element in the scene in pre-order:
 \begin{itemize}
 \item  Try to find the AAST node to which the element corresponds.
 \item  If the node exists (or can be generated):
 \begin{itemize}
 \item  Transform the subtree rooted at the node to incorporate the
 annotation.
 \end{itemize}
 \end{itemize}
 \item  Pretty-print AAST.
 \end{enumerate}
 \end{enumerate}

 \section{Planning}

 There will be some time required for integrating the two parts if they
 are implemented separately.  In particular, doing tree transformation
 separately from path-based positioning means reifying transformations
 so they can be done in a second pass.  Hence it makes more sense to
 do path-based positioning on the way to doing tree transformation.

 Estimated times are as follows:
 \begin{itemize}
 \item  Path-based positioning: 2.5 weeks.
 \item  Tree transformation: 3 to 3.5 weeks if done after path-based
 positioning, 4 weeks otherwise.
 \end{itemize}
 Thus, doing the full job should take about six weeks.

 \appendix

 \section{API extensions (non-exhaustive)}

 \begin{jcode}
 public abstract class AAST extends com.sun.source.Tree {
   /** Interleaves non-leaf text with text serialization of AAST. */
   public void prettyPrint(java.io.Writer w) throws java.io.IOException;

   /** Insert single annotation at given path. */
   public insertAnnotation(TreePath path, Annotation anno);
 }

 // main loop
 AAST insertAnnotations(annotations.el.AScene scene, AAST aast);
 \end{jcode}

 \newpage
 \section{Excerpts from original API}\label{sec:exc}

 \begin{jcode}
 public interface Tree {
     public enum Kind {  // MUCH larger in actual interface
         /**
          * Used for instances of {@link BinaryTree} representing
          * addition or string concatenation {@code +}.
          */
         PLUS(BinaryTree.class),
         /**
          * Used for instances of {@link LiteralTree} representing
          * an integral literal expression of type {@code int}.
          */
         INT_LITERAL(LiteralTree.class);
     }
     Kind getKind();
     /**
      * Accept method used to implement the visitor pattern.  The
      * visitor pattern is used to implement operations on trees.
      *
      * @param <R> result type of this operation.
      * @param <D> type of additional data.
      */
     <R,D> R accept(TreeVisitor<R,D> visitor, D data);
 }
 public interface ExpressionTree extends Tree {}
 public interface BinaryTree extends ExpressionTree {
     ExpressionTree getLeftOperand();
     ExpressionTree getRightOperand();
 }
 public interface LiteralTree extends ExpressionTree {
     Object getValue();
 }

 public abstract class JCTree implements Tree {
     public int getStartPosition() {
         return TreeInfo.getStartPos(this);
     }
     public static abstract class JCExpression
             extends JCTree implements ExpressionTree {}
     public static class JCBinary extends JCExpression implements BinaryTree {
         private Tag opcode;
         public JCExpression lhs;
         public JCExpression rhs;
         public Symbol operator;
         protected JCBinary(Tag opcode,
                          JCExpression lhs,
                          JCExpression rhs,
                          Symbol operator) {
             this.opcode = opcode;
             this.lhs = lhs;
             this.rhs = rhs;
             this.operator = operator;
         }
         public Kind getKind() { return TreeInfo.tagToKind(getTag()); }
         public JCExpression getLeftOperand() { return lhs; }
         public JCExpression getRightOperand() { return rhs; }
         public Symbol getOperator() { return operator; }
         @Override
         public <R,D> R accept(TreeVisitor<R,D> v, D d) {
             return v.visitBinary(this, d);
         }
     }
     public static class JCLiteral extends JCExpression implements LiteralTree {
         public TypeTag typetag;
         public Object value;
         protected JCLiteral(TypeTag typetag, Object value) {
             this.typetag = typetag;
             this.value = value;
         }
         public Kind getKind() {
             return typetag.getKindLiteral();
         }
         public Object getValue() {
             switch (typetag) {
                 case BOOLEAN:
                     int bi = (Integer) value;
                     return (bi != 0);
                 case CHAR:
                     int ci = (Integer) value;
                     char c = (char) ci;
                     if (c != ci)
                         throw new AssertionError("bad value for char literal");
                     return c;
                 default:
                     return value;
             }
         }
         @Override
         public <R,D> R accept(TreeVisitor<R,D> v, D d) {
             return v.visitLiteral(this, d);
         }
     }
 }
 \end{jcode}

 \end{document}
	\documentclass{article}
	\title{AFU v4 Design}
	\author{Dan Brown}

	\usepackage{graphics}
	\usepackage{verbatim}
	\newenvironment{code}{\footnotesize\verbatim}{\endverbatim\normalsize}
	\newenvironment{jcode}{\footnotesize\verbatim}{\endverbatim\normalsize}

	\begin{document}
	\maketitle

	\section{Design Considerations}

	The Annotation File Utilities (AFU) include programs for extracting
	annotations from Java class files and inserting annotations into Java
	source or class files, in accordance with the Java 8 language and virtual
	machine specification. This
	document uses AFU in the singular, in reference to the source
	insertion program except where otherwise noted.

	The AFU suffers from multiple design flaws that impede performance and
	maintainability, including:

	\begin{description}
	\item[Unstructured text modification.] The AFU inserts
	annotations into source text instead of modifying syntax trees.
	The use of source positions rather than tree paths for indicating
	insertion placement has led to code that is fragile and hence difficult
	to maintain.
	\item[Non-uniform logic.] There are two distinct ways to
	identify where to insert annotations into the source, each having its
	own distinct data structures and logic. Consequently, it is possible to
	have inconsistent results for different representations of the same
	location.
	\item[Inefficient traversal.] The search strategy is
	generate-and-test: it chooses an AST node in a preorder traversal and
	checks every insertion specification relevant to the class (or package)
	to determine whether it applies to the current node. Furthermore, since
	the implementations of some tests contain recursive calls going up the
	path to the root, these tests can be repeated many times over for some
	nodes.
	\end{description}

	The proposed design is aimed at mitigating these problems. It makes
	the traversal a single pass over the annotations, with limited
	re-visitation of AST nodes. It reduces the chance for inconsistency
	between formats by finding the AST node that corresponds to a code
	location given in the scene before determining the insertion position.
	It modifies an AST data structure rather than the source itself,
	preserving formatting and commentary by incorporating non-leaf text
	(i.e.\ whitespace and comments) into the representation of the AST.

	\section{Current AFU}

	The current AFU source code (3.6.16 as of 11 June 2015) is organized
	into four packages (not counting the \texttt{asmx} and
	\texttt{scene\_lib} sub-projects):
	\begin{itemize}
	\item \texttt{annotator}, containing the main class and a class
	representing source text;
	\item \texttt{annotator.find},\ containing the classes that implement
	the core insertion logic;
	\item \texttt{annotator.scanner},\ containing support classes for
	\texttt{annotator.find};\ and
	\item \texttt{annotator.specification},\ containing classes that
	manage intermediate representations of the insertions specified in a
	Java Annotation Index File (JAIF).
	\end{itemize}
	Should the proposed redesign be implemented, it is anticipated that the
	first will be simplified somewhat and modified to reflect changes in the
	other packages, the second will be radically simplified, the third will
	remain the same, and the fourth will be eliminated.

	The details of \texttt{annotator.scanner}\ are not discussed here, since
	the proposed redesign will neither change them nor be influenced by
	them. Of the classes in \texttt{annotator.specification},\ only
	\texttt{IndexFileSpecification},\ which extracts \texttt{Insertion}s
	from \texttt{scene-lib}'s JAIF representation
	(\texttt{annotations.el.AScene}), is relevant to the redesign.

	By far the largest and most complex package is \texttt{annotator.find}.
	The insertion logic is distributed among \texttt{TreeFinder} and
	numerous implementations of the \texttt{Criterion} and
	\texttt{Insertion} interfaces. (A \texttt{Criterion} is a reified
	condition that provides a method
	\texttt{boolean isSatisfiedBy(TreePath path)},
	and an \texttt{Insertion} includes matching criteria, output parameters,
	and methods for generating the source text to be inserted.)

	A simplified outline of the basic flow of the program is as follows:

	\begin{enumerate}
	\item Interpret each JAIF as a ``scene'' (\texttt{AScene}), then invoke
	the \texttt{parse} method in \texttt{IndexFileSpecification} to extract
	the indicated \texttt{Insertion}s from the scene.
	\item Parse each Java source file to obtain a (type-checked) AST.
	\item Create an empty map from (Integer) source positions to
	insertions.
	\item For each node in the AST in pre-order:
	\begin{itemize}
	\item For each extracted \texttt{Insertion} that is relevant to the
	immediately enclosing class (or package for compilation unit nodes):
	\begin{itemize}
	\item If the path from the root to the current node satisfies every
	\texttt{Criterion} for the \texttt{Insertion}:
	\begin{enumerate}
	\item Find the appropriate source position for the annotation.
	\item Add a mapping from the position to the \texttt{Insertion} to the
	map.
	\item Remove the insertion from the extracted set.
	\end{enumerate}
	\end{itemize}
	\end{itemize}
	\item For each position-to-insertion mapping in descending order by
	position:
	\begin{itemize}
	\item Generate insertion text and insert at indicated position.
	\end{itemize}
	\item Emit modified source code.
	\end{enumerate}

	\section{Proposed Revisions}

	\begin{figure}[ht]
	\begin{center}
	\resizebox{\columnwidth}{!}{\includegraphics{dataflow.png}}
	\end{center}
	\caption{
	Data flow between components. Boxes represent sources and sinks;
	ellipses represent Java objects. Bold lines mark new classes and data
	paths, and dashed lines indicate current data flows to be bypassed.
	}
	\label{fig:dataflow}
	\end{figure}

	Figure~\ref{fig:dataflow} gives an overview of proposed changes to the
	flow of data between classes. First, the current re-interpretation of
	the scene as a ``specification'' and subsequent extraction of
	\texttt{Criteria} will be eliminated, as the necessary information can
	be determined directly from the \texttt{Scene}. Second, the source text
	will be incorporated into the augmented AST in a more convenient form,
	so there will no need to refer back to the source itself.

	The division of responsibilities among classes will remain largely the
	same, with the major differences corresponding to the changes in the
	data flow. First, the annotation locations will be read from the
	\texttt{Scene} and mapped to the corresponding path in the (augmented)
	AST (if any). Second, since \texttt{AAST} will incorporate the salient
	capabilities of \texttt{JCTree} and \texttt{Source}, it will cover the
	responsibilities of both classes.

	Many of the current supporting classes will be retained as they are. In
	particular, the new AFU will continue to rely on javac's AST classes and
	parser, and \texttt{scene-lib} probably will need little or no
	modification. The \texttt{annotator.find} package will be radically
	simplified with the elimination of the \texttt{Criterion} and
	(possibly) \texttt{Insertion} hierarchies, as the insertion location
	within the AST can be determined directly from the \texttt{Scene}.

	The redesign can be broken down into two independent parts:
	\begin{description}
	\item{Path-based positioning.} Eliminate the \texttt{Criterion}
	hierarchy; determine the insertion location and the text to be inserted
	during (a single) scene traversal; let the scene traversal rather than
	an AST traversal be the outer loop. Advantages: since insertion is
	based on the node rather than a numeric position, there is only one
	positioning logic (even if there are still two matching objects); an
	entire pass and a large amount of intermediate data are eliminated, thus
	improving time and space efficiency.
	\item{Tree transformation.} Eliminate the \texttt{Insertion} hierarchy;
	instead of calculating numeric positions, apply a transformation to the
	appropriate subtree. Advantage: the logic of tree transformation is
	both closer to the specification and easier to comprehend (and maintain)
	than the logic of finding numeric positions for inserting text by
	reverse order.
	\end{description}

	While these have no obvious technical disadvantages with respect to the
	existing AFU, the payoff must be weighed against the costs. The current
	codebase is mostly stable at this point, so any increase in reliability
	will be small (though for verification software, a jump from, say,
	99.99\% to 99.9999\% correct makes an enormous difference).
	Fortunately, the AFU's extensive test suite will make it clear how
	quickly a revised implementation achieves the same level of reliability.

	\subsection{Path-based Positioning}

	The current AFU creates transforms the scene into a set of
	\texttt{Insertions}, each with its own \texttt{Criteria} (a collection
	of objects of various \texttt{Criterion} subclasses). In doing so, it
	throws away information about hierarchical relationships that could be
	used to guide the search for matching locations in the code.

	\begin{figure}[ht]
	\begin{center}
	\resizebox{156pt}{!}{\includegraphics{corresp.png}}
	\end{center}
	\caption{
	Correspondence between scene and AST.
	}
	\label{fig:corresp}
	\end{figure}

	It makes more sense to traverse an AST in parallel (conceptually, not
	necessarily by different processors) with a scene that refers to it.
	For example, if a class specification (\texttt{AClass}) in a scene
	corresponds to a class node in an AST, each time one of the method
	specifications (\texttt{AMethod}) is visited, the corresponding method
	node (see Figure~\ref{fig:corresp}) should be visited as well to see
	whether any insertions under the \texttt{AMethod} apply to it. This
	strategy not only obviates an additional pass but avoids much useless
	test repetition, as there is no need to test \emph{every} potential
	insertion against \emph{every} AST node encountered before the correct
	insertion site. With this change, it becomes possible to generate
	\texttt{Insertion}s on the fly---or perhaps even to go ahead and insert
	text, though managing order dependencies would take some additional
	work.

	The main loop of the program thus will become a pre-order traversal of
	\texttt{AScene} rather than of \texttt{JCTree}, although the scene
	traversal will explore corresponding parts of the AST. In other words:
	the process will be guided by the annotations to be inserted rather than
	by the shape of the AST. Hence, each specified annotation will be
	visited exactly once. The information currently used to create
	\texttt{Criteria} for the insertions can instead be used to directly
	determine the path (if any) in the AST to which the current scene
	element corresponds.

	\subsection{Tree Transformation}

	The current AFU finds numeric source positions in trees and inserts
	annotations directly into the source code. The position-finding logic
	is therefore applied at a time other than insertion, introducing
	insertion order dependencies and making it harder to find the source of
	a bug.

	A new class hierarchy, the ``Augmented AST'' (\texttt{AAST}), will
	directly manage non-leaf text--that is, inter-leaf text (whitespace and
	comments) and intra-branch text (implicit or contained in branch node
	data members). It will be an implementation of
	\texttt{com.sun.source.Tree} that provides needed bits from
	\texttt{com.sun.tools.javac.tree.JCTree} and defines a parallel subclass
	hierarchy. AAST and subclasses provide methods to retrieve the
	inter-leaf text segments that precede and follow an AST node and
	printing methods that preserve the formatting of the source code.

	Non-leaf text will be stored in three types of locations in an AAST:
	\begin{enumerate}
	\item Each leaf node will store any non-code text that immediately
	follows.
	\item For branch nodes, for every implicit text segment or data member
	that is represented in the source code but not in a descendant node, the
	node will store any whitespace and comments that follow, unless the
	content is at the end of the node's text.
	\item The computation unit node will store any text that precedes the
	beginning of the Java code.
	\end{enumerate}
	In this way, all text segment references occur exactly once in the AST,
	and annotations can be inserted at the front with no need to take
	preceding text into account.

	An annotation insertion will consist of a transformation of a tree node.
	If no additional code is to be generated, the transformation is simple:
	\begin{itemize}
	\item For type annotations, the parent AST node of the element to be
	annotated adds the optional annotation node if it is absent, then adds
	the annotation to the annotation node.
	\item For declaration annotations, the annotation is added onto the
	front of a declaration's list of modifiers.
	\end{itemize}
	Cases that involve code generation are more complex.
	Figures~\ref{fig:nocast} and \ref{fig:typecast} give an example (pre-
	and post-insertion, respectively) of a transformation that requires a
	typecast to be generated and inserted. Note that previously existing
	non-leaf text segments remain unaltered.

	\begin{figure}[ht]
	\begin{center}
	\resizebox{261pt}{!}{\includegraphics{nocast.png}}
	\end{center}
	\caption{
	Augmented subgraph for \texttt{int i = 0}. Text of leaf nodes and text
	representing elements of branch nodes are connected with solid arrows,
	and inter-leaf and inter-element text segments are connected with dashed
	arrows. The text of the subgraph can be read from the lower frontier of
	the graph. (Underscores represent spaces here.)
	}
	\label{fig:nocast}
	\end{figure}

	\begin{figure}[ht]
	\begin{center}
	\resizebox{\columnwidth}{!}{\includegraphics{typecast.png}}
	\end{center}
	\caption{
	Augmented subgraph for \texttt{int i = (@A int) 0}, the result of adding
	an annotated cast \texttt{(@A int)} to the previous figure's code. With
	respect to the graph in the previous figure, \texttt{TypeCast} has taken
	\texttt{Literal}'s place in the tree, and \texttt{Literal} has become a
	child of \texttt{TypeCast} along with the newly inserted material.
	}
	\label{fig:typecast}
	\end{figure}

	The other major change will be in the output generation process, which
	will depend on the \texttt{AAST} rather than directly on the source.
	The \texttt{AAST}'s pretty-printer will perform an in-order traversal
	and emit the text of each leaf interleaved with the non-leaf text.

	\subsection{Algorithm Summary}

	\begin{enumerate}
	\item Interpret each JAIF as a ``scene'' (\texttt{AScene}).
	\item Parse each Java source file to obtain a (type-checked) AST.
	\item For each AST:
	\begin{enumerate}
	\item Transform the AST into an AAST.
	\item For each annotated element in the scene in pre-order:
	\begin{itemize}
	\item Try to find the AAST node to which the element corresponds.
	\item If the node exists (or can be generated):
	\begin{itemize}
	\item Transform the subtree rooted at the node to incorporate the
	annotation.
	\end{itemize}
	\end{itemize}
	\item Pretty-print AAST.
	\end{enumerate}
	\end{enumerate}

	\section{Planning}

	There will be some time required for integrating the two parts if they
	are implemented separately. In particular, doing tree transformation
	separately from path-based positioning means reifying transformations
	so they can be done in a second pass. Hence it makes more sense to
	do path-based positioning on the way to doing tree transformation.

	Estimated times are as follows:
	\begin{itemize}
	\item Path-based positioning: 2.5 weeks.
	\item Tree transformation: 3 to 3.5 weeks if done after path-based
	positioning, 4 weeks otherwise.
	\end{itemize}
	Thus, doing the full job should take about six weeks.

	\appendix

	\section{API extensions (non-exhaustive)}

	\begin{jcode}
	public abstract class AAST extends com.sun.source.Tree {
	/** Interleaves non-leaf text with text serialization of AAST. */
	public void prettyPrint(java.io.Writer w) throws java.io.IOException;

	/** Insert single annotation at given path. */
	public insertAnnotation(TreePath path, Annotation anno);
	}

	// main loop
	AAST insertAnnotations(annotations.el.AScene scene, AAST aast);
	\end{jcode}

	\newpage
	\section{Excerpts from original API}\label{sec:exc}

	\begin{jcode}
	public interface Tree {
	public enum Kind { // MUCH larger in actual interface
	/**
	* Used for instances of {@link BinaryTree} representing
	* addition or string concatenation {@code +}.
	*/
	PLUS(BinaryTree.class),
	/**
	* Used for instances of {@link LiteralTree} representing
	* an integral literal expression of type {@code int}.
	*/
	INT_LITERAL(LiteralTree.class);
	}
	Kind getKind();
	/**
	* Accept method used to implement the visitor pattern. The
	* visitor pattern is used to implement operations on trees.
	*
	* @param <R> result type of this operation.
	* @param <D> type of additional data.
	*/
	<R,D> R accept(TreeVisitor<R,D> visitor, D data);
	}
	public interface ExpressionTree extends Tree {}
	public interface BinaryTree extends ExpressionTree {
	ExpressionTree getLeftOperand();
	ExpressionTree getRightOperand();
	}
	public interface LiteralTree extends ExpressionTree {
	Object getValue();
	}

	public abstract class JCTree implements Tree {
	public int getStartPosition() {
	return TreeInfo.getStartPos(this);
	}
	public static abstract class JCExpression
	extends JCTree implements ExpressionTree {}
	public static class JCBinary extends JCExpression implements BinaryTree {
	private Tag opcode;
	public JCExpression lhs;
	public JCExpression rhs;
	public Symbol operator;
	protected JCBinary(Tag opcode,
	JCExpression lhs,
	JCExpression rhs,
	Symbol operator) {
	this.opcode = opcode;
	this.lhs = lhs;
	this.rhs = rhs;
	this.operator = operator;
	}
	public Kind getKind() { return TreeInfo.tagToKind(getTag()); }
	public JCExpression getLeftOperand() { return lhs; }
	public JCExpression getRightOperand() { return rhs; }
	public Symbol getOperator() { return operator; }
	@Override
	public <R,D> R accept(TreeVisitor<R,D> v, D d) {
	return v.visitBinary(this, d);
	}
	}
	public static class JCLiteral extends JCExpression implements LiteralTree {
	public TypeTag typetag;
	public Object value;
	protected JCLiteral(TypeTag typetag, Object value) {
	this.typetag = typetag;
	this.value = value;
	}
	public Kind getKind() {
	return typetag.getKindLiteral();
	}
	public Object getValue() {
	switch (typetag) {
	case BOOLEAN:
	int bi = (Integer) value;
	return (bi != 0);
	case CHAR:
	int ci = (Integer) value;
	char c = (char) ci;
	if (c != ci)
	throw new AssertionError("bad value for char literal");
	return c;
	default:
	return value;
	}
	}
	@Override
	public <R,D> R accept(TreeVisitor<R,D> v, D d) {
	return v.visitLiteral(this, d);
	}
	}
	}
	\end{jcode}

	\end{document}