Review, comment, and reformat IPA.txt, including feedback comments. Formatting includes: - removing line wraps (Emacs Cmd-Q), to make text easier to read - provide useful indentation - call out caveats and notes more explictly Stylistically, I prefer the document talk in 3rd person instead of "we". The term "we" is unambiguous, and sometimes refers to different things. I've passed over the existing paragraphs and made them speak more about specific entities that compose the analyzer and what they do (e.g., ExprEngine) instead of "we" referring to the analyzer. Further, I have substituted some vague concepts such as "state" or "program state" and replaced them with their precise implementation counterparts (e.g., ProgramState). This makes the document more technically precise throughout the entire narrative, which would sometimes use vague terms and other times precise terms. I've placed several comments within the document, which can be seen with ***TMK/COMMENT***, which indicate places that need to be enhanced or clarified, or called out as questions about intended bheavior. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@162338 91177308-0d34-0410-b5e6-96231b3b80d8

commit: 77df8d960e276aa6f733f7b79ec63409a76d8df1 [log] [tgz]
author: Ted Kremenek <kremenek@apple.com> Wed Aug 22 01:20:05 2012 +0000
committer: Ted Kremenek <kremenek@apple.com> Wed Aug 22 01:20:05 2012 +0000
tree: 5ce55bfd722d8403702a854c1eb739173fba4db3
parent: a779e273b1bcd6e0af08234cc3d956220db4c5f4 [diff]
diff --git a/docs/analyzer/IPA.txt b/docs/analyzer/IPA.txt
index c52b17a..e16007c 100644
--- a/docs/analyzer/IPA.txt
+++ b/docs/analyzer/IPA.txt

@@ -1,97 +1,303 @@
 Inlining
 ========
 
-Inlining Modes
------------------------
--analyzer-ipa=none - All inlining is disabled. This is the only mode available in LLVM 3.1 and earlier and in Xcode 4.3 and earlier.
--analyzer-ipa=basic-inlining - Turns on inlining for C functions, C++ static member functions, and blocks -- essentially, the calls that behave like simple C function calls. This is essentially the mode used in Xcode 4.4.
--analyzer-ipa=inlining - Turns on inlining when we can confidently find the function/method body corresponding to the call. (C functions, static functions, devirtualized C++ methods, ObjC class methods, ObjC instance methods when we are confident about the dynamic type of the instance).
--analyzer-ipa=dynamic - Inline instance methods for which the type is determined at runtime and we are not 100% sure that our type info is correct. For virtual calls, inline the most plausible definition.
--analyzer-ipa=dynamic-bifurcate - Same as -analyzer-ipa=dynamic, but the path is split. We inline on one branch and do not inline on the other. This mode does not drop the coverage in cases when the parent class has code that is only exercised when some of its methods are overriden.
+  -analyzer-ipa=none - All inlining is disabled. This is the only mode available
+     in LLVM 3.1 and earlier and in Xcode 4.3 and earlier.
+
+  -analyzer-ipa=basic-inlining - Turns on inlining for C functions, C++ static
+     member functions, and blocks -- essentially, the calls that behave like
+     simple C function calls. This is essentially the mode used in Xcode 4.4.
+
+  -analyzer-ipa=inlining - Turns on inlining when we can confidently find the
+    function/method body corresponding to the call. (C functions, static
+    functions, devirtualized C++ methods, Objective-C class methods, Objective-C
+    instance methods when ExprEngine is confident about the dynamic type of the
+    instance).
+
+  -analyzer-ipa=dynamic - Inline instance methods for which the type is
+   determined at runtime and we are not 100% sure that our type info is
+   correct. For virtual calls, inline the most plausible definition.
+
+  -analyzer-ipa=dynamic-bifurcate - Same as -analyzer-ipa=dynamic, but the path
+   is split. We inline on one branch and do not inline on the other. This mode
+   does not drop the coverage in cases when the parent class has code that is
+   only exercised when some of its methods are overriden.
 
 Currently, -analyzer-ipa=basic-inlining is the default mode.
 
 Basics of Implementation
 -----------------------
 
-The low-level mechanism of inlining a function is handled in ExprEngine::inlineCall and ExprEngine::processCallExit. If the conditions are right for inlining, a CallEnter node is created and added to the analysis work list. The CallEnter node marks the change to a new LocationContext representing the called function, and its state includes the contents of the new stack frame. When the CallEnter node is actually processed, its single successor will be a edge to the first CFG block in the function.
+The low-level mechanism of inlining a function is handled in
+ExprEngine::inlineCall and ExprEngine::processCallExit.
 
-Exiting an inlined function is a bit more work, fortunately broken up into reasonable steps:
-1. The CoreEngine realizes we're at the end of an inlined call and generates a CallExitBegin node.
-2. ExprEngine takes over (in processCallExit) and finds the return value of the function, if it has one. This is bound to the expression that triggered the call. (In the case of calls without origin expressions, such as destructors, this step is skipped.)
-3. Dead symbols and bindings are cleaned out from the state, including any local bindings.
-4. A CallExitEnd node is generated, which marks the transition back to the caller's LocationContext.
-5. Custom post-call checks are processed and the final nodes are pushed back onto the work list, so that evaluation of the caller can continue.
+If the conditions are right for inlining, a CallEnter node is created and added
+to the analysis work list. The CallEnter node marks the change to a new
+LocationContext representing the called function, and its state includes the
+contents of the new stack frame. When the CallEnter node is actually processed,
+its single successor will be a edge to the first CFG block in the function.
+
+Exiting an inlined function is a bit more work, fortunately broken up into
+reasonable steps:
+
+1. The CoreEngine realizes we're at the end of an inlined call and generates a
+   CallExitBegin node.
+
+2. ExprEngine takes over (in processCallExit) and finds the return value of the
+   function, if it has one. This is bound to the expression that triggered the
+   call. (In the case of calls without origin expressions, such as destructors,
+   this step is skipped.)
+
+3. Dead symbols and bindings are cleaned out from the state, including any local
+   bindings.
+
+4. A CallExitEnd node is generated, which marks the transition back to the
+   caller's LocationContext.
+
+5. Custom post-call checks are processed and the final nodes are pushed back
+   onto the work list, so that evaluation of the caller can continue.
 
 Retry Without Inlining
+----------------------
+
+In some cases, we would like to retry analyzes without inlining the particular
+call.
+
+Currently, we use this technique to recover the coverage in case we stop
+analyzing a path due to exceeding the maximum block count inside an inlined
+function.
+
+When this situation is detected, we walk up the path to find the first node
+before inlining was started and enqueue it on the WorkList with a special
+ReplayWithoutInlining bit added to it (ExprEngine::replayWithoutInlining).  The
+path is then re-analyzed from that point without inlining that particular call.
+
+Deciding When to Inline
 -----------------------
 
-In some cases, we would like to retry analyzes without inlining the particular call. Currently, we use this technique to recover the coverage in case we stop analyzing a path due to exceeding the maximum block count inside an inlined function. When this situation is detected, we walk up the path to find the first node before inlining was started and enqueue it on the WorkList with a special ReplayWithoutInlining bit added to it (ExprEngine::replayWithoutInlining).
+In general, the analyzer attempts to inline as much as possible, since it
+provides a better summary of what actually happens in the program.  There are
+some cases, however, where the analyzer chooses not to inline:
 
-Deciding when to inline
------------------------
-In general, we try to inline as much as possible, since it provides a better summary of what actually happens in the program. However, there are some cases where we choose not to inline:
-- if there is no definition available (of course)
-- if we can't create a CFG or compute variable liveness for the function
-- if we reach a cutoff of maximum stack depth (to avoid infinite recursion)
-- if the function is variadic
-- in C++, we don't inline constructors unless we know the destructor will be inlined as well
-- in C++, we don't inline allocators (custom operator new implementations), since we don't properly handle deallocators (at the time of this writing)
-- "Dynamic" calls are handled specially; see below.
-- Engine:FunctionSummaries map stores additional information about declarations, some of which is collected at runtime based on previous analyzes of the function. We do not inline functions which were not profitable to inline in a different context (for example, if the maximum block count was exceeded, see Retry Without Inlining).
+- If there is no definition available for the called function or method.  In
+  this case, there is no opportunity to inline.
+
+- If we the CFG cannot be constructed for a called function, or the liveness
+  cannot be computed.  These are prerequisites for analyzing a function body,
+  with or without inlining.
+
+- If the LocationContext chain for a given ExplodedNode reaches a maximum cutoff
+  depth.  This prevents unbounded analysis due to infinite recursion, but also
+  serves as a useful cutoff for performance reasons.
+
+- If the function is variadic.  This is not a hard limitation, but an engineering
+  limitation.
+
+  Tracked by: <rdar://problem/12147064> Support inlining of variadic functions
+
+- In C++, ExprEngine does not inline constructors unless the destructor is
+  guaranteed to be inlined as well.
+
+  **TMK/COMMENT** This needs to be a bit more precise.  How do we know the
+                  destructor is guaranteed to be inlined?
+
+- In C++, ExprEngine does not inline custom implementations of operator 'new'
+  implementations).  This is due to a lack of complete handling of destructors.
+
+- Calls resulting in "dynamic dispatch" are specially handled.  See more below.
+
+- Engine::FunctionSummaries map stores additional information about
+  declarations, some of which is collected at runtime based on previous analyzes
+  of the function. We do not inline functions which were not profitable to
+  inline in a different context (for example, if the maximum block count was
+  exceeded, see Retry Without Inlining).
 
 
-Dynamic calls and devirtualization
+Dynamic Calls and Devirtualization
 ----------------------------------
-"Dynamic" calls are those that are resolved at runtime, such as C++ virtual method calls and Objective-C message sends. Due to the path-sensitive nature of the analyzer, we may be able to figure out the dynamic type of the object whose method is being called and thus "devirtualize" the call, i.e. find the actual method that will be called at runtime. (Obviously this is not always possible.) This is handled by CallEvent's getRuntimeDefinition method.
 
-Type information is tracked as DynamicTypeInfo, stored within the program state. If no DynamicTypeInfo has been explicitly set for a region, it will be inferred from the region's type or associated symbol. Information from symbolic regions is weaker than from true typed regions; a C++ object declared "A obj" is known to have the class 'A', but a reference "A &ref" may dynamically be a subclass of 'A'. The DynamicTypePropagation checker gathers and propagates the type information.
+"Dynamic" calls are those that are resolved at runtime, such as C++ virtual
+method calls and Objective-C message sends. Due to the path-sensitive nature of
+the analyzer, the analyzer may be able to reason about the dynamic type of the
+object whose method is being called and thus "devirtualize" the call. 
 
-(Warning: not all of the existing analyzer code has been retrofitted to use DynamicTypeInfo, nor is it universally appropriate. In particular, DynamicTypeInfo always applies to a region with all casts stripped off, but sometimes the information provided by casts can be useful.)
+This path-sensitive devirtualization occurs when the analyzer can determine what
+method would actually be called at runtime.  This is possible when the type
+information is constrained enough for a simulated C++/Objective-C object in
+order to make such a decision.
 
-When asked to provide a definition, the CallEvents for dynamic calls will use the type info in their state to provide the best definition of the method to be called. In some cases this devirtualization can be perfect or near-perfect, and we can inline the definition as usual. In others we can make a guess, but report that our guess may not be the method actually called at runtime.
+ == RuntimeDefinition ==
 
-The -analyzer-ipa option has five different modes: none, basic-inlining, inlining, dynamic, and dynamic-bifurcate. Under -analyzer-ipa=dynamic, all dynamic calls are inlined, whether we are certain or not that this will actually be the definition used at runtime. Under -analyzer-ipa=inlining, only "near-perfect" devirtualized calls are inlined*, and other dynamic calls are evaluated conservatively (as if no definition were available). Under -analyzer-ipa=basic-inlining, only simple calls (C functions and a few others) are inlined, and no devirtualization is performed.
+The basis of this devirtualization is CallEvent's getRuntimeDefinition() method,
+which returns a RuntimeDefinition object.  The "runtime" + "defintion"
+corresponds to the definition of the called method as would be computed at
+runtime.  In the case of no dynamic dispatch, this object resolves to a Decl*
+for the called function.  In the case of dynamic dispatch, the RuntimeDefinition
+object also includes an optional MemRegion* corresponding to the object being
+called (i.e., the "receiver" in Objective-C parlance).  This information is
+later consulted by ExprEngine (along with tracked dynamic type information) to
+potentially resolve the called method.
 
-* Currently, no Objective-C messages are not inlined under -analyzer-ipa=inlining, even if we are reasonably confident of the type of the receiver. We plan to enable this once we have tested our heuristics more thoroughly.
+ == DynamicTypeInfo ==
 
-The last option, -analyzer-ipa=dynamic-bifurcate, behaves similarly to "dynamic", but performs a conservative invalidation in the general virtual case in /addition/ to inlining. The details of this are discussed below.
+In addition to RuntimeDefinition, the analyzer needs to track the potential
+runtime type of a simulated C++/Objective-C object.  As the analyzer analyzes a
+path, it may accrue more information to refine the knowledge about the type of
+an object.  This can then be used to make better decisions about the target
+method of a call.
 
+Such type information is tracked as DynamicTypeInfo.  This is path-sensitive
+data that is stored in ProgramState, which defines a mapping from MemRegions to
+an (optional) DynamicTypeInfo.
+
+If no DynamicTypeInfo has been explicitly set for a MemRegion, it will be lazily
+inferred from the region's type or associated symbol. Information from symbolic
+regions is weaker than from true typed regions.
+
+  EXAMPLE: A C++ object declared "A obj" is known to have the class 'A', but a
+           reference "A &ref" may dynamically be a subclass of 'A'.
+
+The DynamicTypePropagation checker gathers and propagates DynamicTypeInfo,
+updating it as information is observed along a path that can refine that type
+information for a region.
+
+  WARNING: Not all of the existing analyzer code has been retrofitted to use
+           DynamicTypeInfo, nor is it universally appropriate. In particular,
+           DynamicTypeInfo always applies to a region with all casts stripped
+           off, but sometimes the information provided by casts can be useful.)
+
+
+When asked to provide a definition, the CallEvents for dynamic calls will use
+the DynamicTypeInfo in their ProgramState to provide the best definition of the
+method to be called. In some cases this devirtualization can be perfect or
+near-perfect, and the analyzer can inline the definition as usual. In other
+cases ExprEngine can make a guess, but report that our guess may not be the
+method actually called at runtime.
+
+  **TMK/COMMENT**: what does it mean to "report" that our guess may not be the
+                   method actually called?
+
+The -analyzer-ipa option has four different modes: none, inlining, dynamic, and
+dynamic-bifurcate. Under -analyzer-ipa=dynamic, all dynamic calls are inlined,
+whether we are certain or not that this will actually be the definition used at
+runtime. Under -analyzer-ipa=inlining, only "near-perfect" devirtualized calls
+are inlined*, and other dynamic calls are evaluated conservatively (as if no
+definition were available).
+
+* Currently, no Objective-C messages are not inlined under
+  -analyzer-ipa=inlining, even if we are reasonably confident of the type of the
+  receiver. We plan to enable this once we have tested our heuristics more
+  thoroughly.
+
+The last option, -analyzer-ipa=dynamic-bifurcate, behaves similarly to
+"dynamic", but performs a conservative invalidation in the general virtual case
+in *addition* to inlining. The details of this are discussed below.
 
 Bifurcation
 -----------
-ExprEngine::BifurcateCall implements the -analyzer-ipa=dynamic-bifurcate mode. When a call is made on a region with dynamic type information, we bifurcate the path and add the region's processing mode to the GDM. Currently, there are 2 modes: DynamicDispatchModeInlined and DynamicDispatchModeConservative. Going forward, we consult the state of the region to make decisions on whether the calls should be inlined or not, which ensures that we have at most one split per region. The modes model the cases when the dynamic type information is perfectly correct and when the info is not correct (i.e. where the region is a subclass of the type we store in DynamicTypeInfo).
 
-Bifurcation mode allows for increased coverage in cases where the parent method contains code which is only executed when the class is subclassed. The disadvantages of this mode are a (considerable?) performance hit and the possibility of false positives on the path where the conservative mode is used.
+ExprEngine::BifurcateCall implements the -analyzer-ipa=dynamic-bifurcate
+mode.
 
+When a call is made on a region with dynamic type information, ExprEngine
+bifurcate's the path marks the MemRegion (derived from a RuntimeDefinition
+object) with a path-sensitive "mode" in the ProgramState.
+
+Currently, there are 2 modes: 
+
+ DynamicDispatchModeInlined - Models the case where the dynamic type information
+   is perfectly constrained so that a given definition of a method is expected
+   to be the code actually called.  When this mode is set, no birfucation
+   of the path is needed when calling this dynamically dispatched method because
+   the definition is considered fully resolved.
+
+   ***TMK/COMMENT*** - this isn't technically what is happening.  Since the
+      "mode" is associated with a MemRegion, that "mode" applies to all method
+      calls involving that region as the "receiver".  This means that once we
+      decide to inline calls for a given receiver object we always inline.
+
+ DynamicDispatchModeConservative - Models the case where the dynamic type
+   information is imperfect, and implies that the method definition could be
+   overriden in a subclass.  In such cases, ExprEngine does not inline the
+   method, even if a candidate definition is available.  This serves to be
+   conservative about simulating the effects of a call.
+
+Going forward, ExprEngine consult the mode of the MemRegion to make decisions on
+whether the calls should be inlined or not, which ensures that there is at most
+one split per region.
+
+  ***TMK/COMMENT*** isn't this what is happening?
+
+At a high level, "bifurcation mode" allows for increased semantic coverage in
+cases where the parent method contains code which is only executed when the
+class is subclassed. The disadvantages of this mode are a (considerable?)
+performance hit and the possibility of false positives on the path where the
+conservative mode is used.
 
 Objective-C Message Heuristics
 ------------------------------
-We rely on a set of heuristics to partition the set of ObjC method calls into ones that require bifurcation and ones that do not (can or cannot be a subclass). Below are the cases when we consider that the dynamic type of the object is precise (cannot be a subclass):
- - If the object was created with +alloc or +new and initialized with an -init method.
- - If the calls are property accesses using dot syntax. This is based on the assumption that children rarely override properties, or do so in an essentially compatible way.
- - If the class interface is declared inside the main source file. In this case it is unlikely that it will be subclassed.
- - If the method is not declared outside of main source file, either by the receiver's class or by any superclasses.
 
+ExprEngine relies on a set of heuristics to partition the set of Objective-C method
+calls into those that require bifurcation and those that do not. (can or cannot be
+a subclass).
+
+  ***TMK/COMMENT*** what does the "can or cannot be a subclass" even mean?
+
+Below are the cases when the DynamicTypeInfo of the object is considered precise
+(cannot be a subclass):
+
+ - If the object was created with +alloc or +new and initialized with an -init
+   method.
+
+ - If the calls are property accesses using dot syntax. This is based on the
+   assumption that children rarely override properties, or do so in an
+   essentially compatible way.
+
+ - If the class interface is declared inside the main source file. In this case
+   it is unlikely that it will be subclassed.
+
+ - If the method is not declared outside of main source file, either by the
+   receiver's class or by any superclasses.
 
 C++ Inlining Caveats
 --------------------
-C++11 [class.cdtor]p4 describes how the vtable of an object is modified as it is being constructed or destructed; that is, the type of the object depends on which base constructors have been completed. This is tracked using dynamic type info in the DynamicTypePropagation checker.
 
-Temporaries are poorly modelled right now because we're not confident in the placement
+C++11 [class.cdtor]p4 describes how the vtable of an object is modified as it is
+being constructed or destructed; that is, the type of the object depends on
+which base constructors have been completed. This is tracked using
+DynamicTypeInfo in the DynamicTypePropagation checker.
 
-'new' is poorly modelled due to some nasty CFG/design issues (elaborated in PR12014). 'delete' is essentially not modelled at all.
+There are several limitations in the current implementation:
 
-Arrays of objects are modeled very poorly right now. We run only the first constructor and first destructor. Because of this, we don't inline any constructors or destructors for arrays.
+- Temporaries are poorly modelled right now because we're not confident in the
+  placement
 
+- 'new' is poorly modelled due to some nasty CFG/design issues.  This is tracked
+  in PR12014.  'delete' is not modelled at all.
+
+- Arrays of objects are modeled very poorly right now.  ExprEngine currently
+  only simualtes the first constructor and first destructor. Because of this,
+  ExprEngine does not inline any constructors or destructors for arrays.
 
 CallEvent
-=========
+---------
 
-A CallEvent represents a specific call to a function, method, or other body of code. It is path-sensitive, containing both the current state (ProgramStateRef) and stack space (LocationContext), and provides uniform access to the argument values and return type of a call, no matter how the call is written in the source or what sort of code body is being invoked.
+A CallEvent represents a specific call to a function, method, or other body of
+code. It is path-sensitive, containing both the current state (ProgramStateRef)
+and stack space (LocationContext), and provides uniform access to the argument
+values and return type of a call, no matter how the call is written in the
+source or what sort of code body is being invoked.
 
-(For those familiar with Cocoa, CallEvent is roughly equivalent to NSInvocation.)
+  NOTE: For those familiar with Cocoa, CallEvent is roughly equivalent to
+        NSInvocation.
 
-CallEvent should be used whenever there is logic dealing with function calls that does not care how the call occurred. Examples include checking that arguments satisfy preconditions (such as __attribute__((nonnull))), and attempting to inline a call.
+CallEvent should be used whenever there is logic dealing with function calls
+that does not care how the call occurred.
 
-CallEvents are reference-counted objects managed by a CallEventManager. While there is no inherent issue with persisting them (say, in the state's GDM), they are intended for short-lived use, and can be recreated from CFGElements or StackFrameContexts fairly easily.
+Examples include checking that arguments satisfy preconditions (such as
+__attribute__((nonnull))), and attempting to inline a call.
+
+CallEvents are reference-counted objects managed by a CallEventManager. While
+there is no inherent issue with persisting them (say, in a ProgramState's GDM),
+they are intended for short-lived use, and can be recreated from CFGElements or
+StackFrameContexts fairly easily.
commit	77df8d960e276aa6f733f7b79ec63409a76d8df1	[log] [tgz]
author	Ted Kremenek <kremenek@apple.com>	Wed Aug 22 01:20:05 2012 +0000
committer	Ted Kremenek <kremenek@apple.com>	Wed Aug 22 01:20:05 2012 +0000
tree	5ce55bfd722d8403702a854c1eb739173fba4db3
parent	a779e273b1bcd6e0af08234cc3d956220db4c5f4 [diff]