docs/analyzer/IPA.txt - platform/external/clang - Git at Google

 Inlining
 ========

   -analyzer-ipa=none - All inlining is disabled. This is the only mode available
      in LLVM 3.1 and earlier and in Xcode 4.3 and earlier.

   -analyzer-ipa=basic-inlining - Turns on inlining for C functions, C++ static
      member functions, and blocks -- essentially, the calls that behave like
      simple C function calls. This is essentially the mode used in Xcode 4.4.

   -analyzer-ipa=inlining - Turns on inlining when we can confidently find the
     function/method body corresponding to the call. (C functions, static
     functions, devirtualized C++ methods, Objective-C class methods, Objective-C
     instance methods when ExprEngine is confident about the dynamic type of the
     instance).

   -analyzer-ipa=dynamic - Inline instance methods for which the type is
    determined at runtime and we are not 100% sure that our type info is
    correct. For virtual calls, inline the most plausible definition.

   -analyzer-ipa=dynamic-bifurcate - Same as -analyzer-ipa=dynamic, but the path
    is split. We inline on one branch and do not inline on the other. This mode
    does not drop the coverage in cases when the parent class has code that is
    only exercised when some of its methods are overriden.

 Currently, -analyzer-ipa=basic-inlining is the default mode.

 Basics of Implementation
 -----------------------

 The low-level mechanism of inlining a function is handled in
 ExprEngine::inlineCall and ExprEngine::processCallExit.

 If the conditions are right for inlining, a CallEnter node is created and added
 to the analysis work list. The CallEnter node marks the change to a new
 LocationContext representing the called function, and its state includes the
 contents of the new stack frame. When the CallEnter node is actually processed,
 its single successor will be a edge to the first CFG block in the function.

 Exiting an inlined function is a bit more work, fortunately broken up into
 reasonable steps:

 1. The CoreEngine realizes we're at the end of an inlined call and generates a
    CallExitBegin node.

 2. ExprEngine takes over (in processCallExit) and finds the return value of the
    function, if it has one. This is bound to the expression that triggered the
    call. (In the case of calls without origin expressions, such as destructors,
    this step is skipped.)

 3. Dead symbols and bindings are cleaned out from the state, including any local
    bindings.

 4. A CallExitEnd node is generated, which marks the transition back to the
    caller's LocationContext.

 5. Custom post-call checks are processed and the final nodes are pushed back
    onto the work list, so that evaluation of the caller can continue.

 Retry Without Inlining
 ----------------------

 In some cases, we would like to retry analyzes without inlining the particular
 call.

 Currently, we use this technique to recover the coverage in case we stop
 analyzing a path due to exceeding the maximum block count inside an inlined
 function.

 When this situation is detected, we walk up the path to find the first node
 before inlining was started and enqueue it on the WorkList with a special
 ReplayWithoutInlining bit added to it (ExprEngine::replayWithoutInlining).  The
 path is then re-analyzed from that point without inlining that particular call.

 Deciding When to Inline
 -----------------------

 In general, the analyzer attempts to inline as much as possible, since it
 provides a better summary of what actually happens in the program.  There are
 some cases, however, where the analyzer chooses not to inline:

 - If there is no definition available for the called function or method.  In
   this case, there is no opportunity to inline.

 - If we the CFG cannot be constructed for a called function, or the liveness
   cannot be computed.  These are prerequisites for analyzing a function body,
   with or without inlining.

 - If the LocationContext chain for a given ExplodedNode reaches a maximum cutoff
   depth.  This prevents unbounded analysis due to infinite recursion, but also
   serves as a useful cutoff for performance reasons.

 - If the function is variadic.  This is not a hard limitation, but an engineering
   limitation.

   Tracked by: <rdar://problem/12147064> Support inlining of variadic functions

 - In C++, ExprEngine does not inline constructors unless the destructor is
   guaranteed to be inlined as well.

   **TMK/COMMENT** This needs to be a bit more precise.  How do we know the
                   destructor is guaranteed to be inlined?

 - In C++, ExprEngine does not inline custom implementations of operator 'new'
   implementations).  This is due to a lack of complete handling of destructors.

 - Calls resulting in "dynamic dispatch" are specially handled.  See more below.

 - Engine::FunctionSummaries map stores additional information about
   declarations, some of which is collected at runtime based on previous analyzes
   of the function. We do not inline functions which were not profitable to
   inline in a different context (for example, if the maximum block count was
   exceeded, see Retry Without Inlining).


 Dynamic Calls and Devirtualization
 ----------------------------------

 "Dynamic" calls are those that are resolved at runtime, such as C++ virtual
 method calls and Objective-C message sends. Due to the path-sensitive nature of
 the analyzer, the analyzer may be able to reason about the dynamic type of the
 object whose method is being called and thus "devirtualize" the call.

 This path-sensitive devirtualization occurs when the analyzer can determine what
 method would actually be called at runtime.  This is possible when the type
 information is constrained enough for a simulated C++/Objective-C object in
 order to make such a decision.

  == RuntimeDefinition ==

 The basis of this devirtualization is CallEvent's getRuntimeDefinition() method,
 which returns a RuntimeDefinition object.  The "runtime" + "defintion"
 corresponds to the definition of the called method as would be computed at
 runtime.  In the case of no dynamic dispatch, this object resolves to a Decl*
 for the called function.  In the case of dynamic dispatch, the RuntimeDefinition
 object also includes an optional MemRegion* corresponding to the object being
 called (i.e., the "receiver" in Objective-C parlance).  This information is
 later consulted by ExprEngine (along with tracked dynamic type information) to
 potentially resolve the called method.

  == DynamicTypeInfo ==

 In addition to RuntimeDefinition, the analyzer needs to track the potential
 runtime type of a simulated C++/Objective-C object.  As the analyzer analyzes a
 path, it may accrue more information to refine the knowledge about the type of
 an object.  This can then be used to make better decisions about the target
 method of a call.

 Such type information is tracked as DynamicTypeInfo.  This is path-sensitive
 data that is stored in ProgramState, which defines a mapping from MemRegions to
 an (optional) DynamicTypeInfo.

 If no DynamicTypeInfo has been explicitly set for a MemRegion, it will be lazily
 inferred from the region's type or associated symbol. Information from symbolic
 regions is weaker than from true typed regions.

   EXAMPLE: A C++ object declared "A obj" is known to have the class 'A', but a
            reference "A &ref" may dynamically be a subclass of 'A'.

 The DynamicTypePropagation checker gathers and propagates DynamicTypeInfo,
 updating it as information is observed along a path that can refine that type
 information for a region.

   WARNING: Not all of the existing analyzer code has been retrofitted to use
            DynamicTypeInfo, nor is it universally appropriate. In particular,
            DynamicTypeInfo always applies to a region with all casts stripped
            off, but sometimes the information provided by casts can be useful.)


 When asked to provide a definition, the CallEvents for dynamic calls will use
 the DynamicTypeInfo in their ProgramState to provide the best definition of the
 method to be called. In some cases this devirtualization can be perfect or
 near-perfect, and the analyzer can inline the definition as usual. In other
 cases ExprEngine can make a guess, but report that our guess may not be the
 method actually called at runtime.

   **TMK/COMMENT**: what does it mean to "report" that our guess may not be the
                    method actually called?

 The -analyzer-ipa option has four different modes: none, inlining, dynamic, and
 dynamic-bifurcate. Under -analyzer-ipa=dynamic, all dynamic calls are inlined,
 whether we are certain or not that this will actually be the definition used at
 runtime. Under -analyzer-ipa=inlining, only "near-perfect" devirtualized calls
 are inlined*, and other dynamic calls are evaluated conservatively (as if no
 definition were available).

 * Currently, no Objective-C messages are not inlined under
   -analyzer-ipa=inlining, even if we are reasonably confident of the type of the
   receiver. We plan to enable this once we have tested our heuristics more
   thoroughly.

 The last option, -analyzer-ipa=dynamic-bifurcate, behaves similarly to
 "dynamic", but performs a conservative invalidation in the general virtual case
 in *addition* to inlining. The details of this are discussed below.

 Bifurcation
 -----------

 ExprEngine::BifurcateCall implements the -analyzer-ipa=dynamic-bifurcate
 mode.

 When a call is made on a region with imprecise dynamic type information
 (RuntimeDefinition::mayHaveOtherDefinitions() evaluates to TRUE), ExprEngine
 bifurcates the path and marks the MemRegion (derived from a RuntimeDefinition
 object) with a path-sensitive "mode" in the ProgramState.

 Currently, there are 2 modes:

  DynamicDispatchModeInlined - Models the case where the dynamic type information
    of the receiver (MemoryRegion) is assumed to be perfectly constrained so
    that a given definition of a method is expected to be the code actually
    called. When this mode is set, ExprEngine uses the Decl from
    RuntimeDefinition to inline any dynamically dispatched call sent to this
    receiver because the function definition is considered to be fully resolved.

  DynamicDispatchModeConservative - Models the case where the dynamic type
    information is assumed to be incorrect, for example, implies that the method
    definition is overriden in a subclass. In such cases, ExprEngine does not
    inline the methods sent to the receiver (MemoryRegion), even if a candidate
    definition is available. This mode is conservative about simulating the
    effects of a call.

 Going forward along the symbolic execution path, ExprEngine consults the mode
 of the receiver's MemRegion to make decisions on whether the calls should be
 inlined or not, which ensures that there is at most one split per region.

 At a high level, "bifurcation mode" allows for increased semantic coverage in
 cases where the parent method contains code which is only executed when the
 class is subclassed. The disadvantages of this mode are a (considerable?)
 performance hit and the possibility of false positives on the path where the
 conservative mode is used.

 Objective-C Message Heuristics
 ------------------------------

 ExprEngine relies on a set of heuristics to partition the set of Objective-C
 method calls into those that require bifurcation and those that do not. Below
 are the cases when the DynamicTypeInfo of the object is considered precise
 (cannot be a subclass):

  - If the object was created with +alloc or +new and initialized with an -init
    method.

  - If the calls are property accesses using dot syntax. This is based on the
    assumption that children rarely override properties, or do so in an
    essentially compatible way.

  - If the class interface is declared inside the main source file. In this case
    it is unlikely that it will be subclassed.

  - If the method is not declared outside of main source file, either by the
    receiver's class or by any superclasses.

 C++ Inlining Caveats
 --------------------

 C++11 [class.cdtor]p4 describes how the vtable of an object is modified as it is
 being constructed or destructed; that is, the type of the object depends on
 which base constructors have been completed. This is tracked using
 DynamicTypeInfo in the DynamicTypePropagation checker.

 There are several limitations in the current implementation:

 - Temporaries are poorly modelled right now because we're not confident in the
   placement

 - 'new' is poorly modelled due to some nasty CFG/design issues.  This is tracked
   in PR12014.  'delete' is not modelled at all.

 - Arrays of objects are modeled very poorly right now.  ExprEngine currently
   only simualtes the first constructor and first destructor. Because of this,
   ExprEngine does not inline any constructors or destructors for arrays.

 CallEvent
 ---------

 A CallEvent represents a specific call to a function, method, or other body of
 code. It is path-sensitive, containing both the current state (ProgramStateRef)
 and stack space (LocationContext), and provides uniform access to the argument
 values and return type of a call, no matter how the call is written in the
 source or what sort of code body is being invoked.

   NOTE: For those familiar with Cocoa, CallEvent is roughly equivalent to
         NSInvocation.

 CallEvent should be used whenever there is logic dealing with function calls
 that does not care how the call occurred.

 Examples include checking that arguments satisfy preconditions (such as
 __attribute__((nonnull))), and attempting to inline a call.

 CallEvents are reference-counted objects managed by a CallEventManager. While
 there is no inherent issue with persisting them (say, in a ProgramState's GDM),
 they are intended for short-lived use, and can be recreated from CFGElements or
 StackFrameContexts fairly easily.
	Inlining
	========

	-analyzer-ipa=none - All inlining is disabled. This is the only mode available
	in LLVM 3.1 and earlier and in Xcode 4.3 and earlier.

	-analyzer-ipa=basic-inlining - Turns on inlining for C functions, C++ static
	member functions, and blocks -- essentially, the calls that behave like
	simple C function calls. This is essentially the mode used in Xcode 4.4.

	-analyzer-ipa=inlining - Turns on inlining when we can confidently find the
	function/method body corresponding to the call. (C functions, static
	functions, devirtualized C++ methods, Objective-C class methods, Objective-C
	instance methods when ExprEngine is confident about the dynamic type of the
	instance).

	-analyzer-ipa=dynamic - Inline instance methods for which the type is
	determined at runtime and we are not 100% sure that our type info is
	correct. For virtual calls, inline the most plausible definition.

	-analyzer-ipa=dynamic-bifurcate - Same as -analyzer-ipa=dynamic, but the path
	is split. We inline on one branch and do not inline on the other. This mode
	does not drop the coverage in cases when the parent class has code that is
	only exercised when some of its methods are overriden.

	Currently, -analyzer-ipa=basic-inlining is the default mode.

	Basics of Implementation
	-----------------------

	The low-level mechanism of inlining a function is handled in
	ExprEngine::inlineCall and ExprEngine::processCallExit.

	If the conditions are right for inlining, a CallEnter node is created and added
	to the analysis work list. The CallEnter node marks the change to a new
	LocationContext representing the called function, and its state includes the
	contents of the new stack frame. When the CallEnter node is actually processed,
	its single successor will be a edge to the first CFG block in the function.

	Exiting an inlined function is a bit more work, fortunately broken up into
	reasonable steps:

	1. The CoreEngine realizes we're at the end of an inlined call and generates a
	CallExitBegin node.

	2. ExprEngine takes over (in processCallExit) and finds the return value of the
	function, if it has one. This is bound to the expression that triggered the
	call. (In the case of calls without origin expressions, such as destructors,
	this step is skipped.)

	3. Dead symbols and bindings are cleaned out from the state, including any local
	bindings.

	4. A CallExitEnd node is generated, which marks the transition back to the
	caller's LocationContext.

	5. Custom post-call checks are processed and the final nodes are pushed back
	onto the work list, so that evaluation of the caller can continue.

	Retry Without Inlining
	----------------------

	In some cases, we would like to retry analyzes without inlining the particular
	call.

	Currently, we use this technique to recover the coverage in case we stop
	analyzing a path due to exceeding the maximum block count inside an inlined
	function.

	When this situation is detected, we walk up the path to find the first node
	before inlining was started and enqueue it on the WorkList with a special
	ReplayWithoutInlining bit added to it (ExprEngine::replayWithoutInlining). The
	path is then re-analyzed from that point without inlining that particular call.

	Deciding When to Inline
	-----------------------

	In general, the analyzer attempts to inline as much as possible, since it
	provides a better summary of what actually happens in the program. There are
	some cases, however, where the analyzer chooses not to inline:

	- If there is no definition available for the called function or method. In
	this case, there is no opportunity to inline.

	- If we the CFG cannot be constructed for a called function, or the liveness
	cannot be computed. These are prerequisites for analyzing a function body,
	with or without inlining.

	- If the LocationContext chain for a given ExplodedNode reaches a maximum cutoff
	depth. This prevents unbounded analysis due to infinite recursion, but also
	serves as a useful cutoff for performance reasons.

	- If the function is variadic. This is not a hard limitation, but an engineering
	limitation.

	Tracked by: <rdar://problem/12147064> Support inlining of variadic functions

	- In C++, ExprEngine does not inline constructors unless the destructor is
	guaranteed to be inlined as well.

	TMK/COMMENT This needs to be a bit more precise. How do we know the
	destructor is guaranteed to be inlined?

	- In C++, ExprEngine does not inline custom implementations of operator 'new'
	implementations). This is due to a lack of complete handling of destructors.

	- Calls resulting in "dynamic dispatch" are specially handled. See more below.

	- Engine::FunctionSummaries map stores additional information about
	declarations, some of which is collected at runtime based on previous analyzes
	of the function. We do not inline functions which were not profitable to
	inline in a different context (for example, if the maximum block count was
	exceeded, see Retry Without Inlining).


	Dynamic Calls and Devirtualization
	----------------------------------

	"Dynamic" calls are those that are resolved at runtime, such as C++ virtual
	method calls and Objective-C message sends. Due to the path-sensitive nature of
	the analyzer, the analyzer may be able to reason about the dynamic type of the
	object whose method is being called and thus "devirtualize" the call.

	This path-sensitive devirtualization occurs when the analyzer can determine what
	method would actually be called at runtime. This is possible when the type
	information is constrained enough for a simulated C++/Objective-C object in
	order to make such a decision.

	== RuntimeDefinition ==

	The basis of this devirtualization is CallEvent's getRuntimeDefinition() method,
	which returns a RuntimeDefinition object. The "runtime" + "defintion"
	corresponds to the definition of the called method as would be computed at
	runtime. In the case of no dynamic dispatch, this object resolves to a Decl*
	for the called function. In the case of dynamic dispatch, the RuntimeDefinition
	object also includes an optional MemRegion* corresponding to the object being
	called (i.e., the "receiver" in Objective-C parlance). This information is
	later consulted by ExprEngine (along with tracked dynamic type information) to
	potentially resolve the called method.

	== DynamicTypeInfo ==

	In addition to RuntimeDefinition, the analyzer needs to track the potential
	runtime type of a simulated C++/Objective-C object. As the analyzer analyzes a
	path, it may accrue more information to refine the knowledge about the type of
	an object. This can then be used to make better decisions about the target
	method of a call.

	Such type information is tracked as DynamicTypeInfo. This is path-sensitive
	data that is stored in ProgramState, which defines a mapping from MemRegions to
	an (optional) DynamicTypeInfo.

	If no DynamicTypeInfo has been explicitly set for a MemRegion, it will be lazily
	inferred from the region's type or associated symbol. Information from symbolic
	regions is weaker than from true typed regions.

	EXAMPLE: A C++ object declared "A obj" is known to have the class 'A', but a
	reference "A &ref" may dynamically be a subclass of 'A'.

	The DynamicTypePropagation checker gathers and propagates DynamicTypeInfo,
	updating it as information is observed along a path that can refine that type
	information for a region.

	WARNING: Not all of the existing analyzer code has been retrofitted to use
	DynamicTypeInfo, nor is it universally appropriate. In particular,
	DynamicTypeInfo always applies to a region with all casts stripped
	off, but sometimes the information provided by casts can be useful.)


	When asked to provide a definition, the CallEvents for dynamic calls will use
	the DynamicTypeInfo in their ProgramState to provide the best definition of the
	method to be called. In some cases this devirtualization can be perfect or
	near-perfect, and the analyzer can inline the definition as usual. In other
	cases ExprEngine can make a guess, but report that our guess may not be the
	method actually called at runtime.

	TMK/COMMENT: what does it mean to "report" that our guess may not be the
	method actually called?

	The -analyzer-ipa option has four different modes: none, inlining, dynamic, and
	dynamic-bifurcate. Under -analyzer-ipa=dynamic, all dynamic calls are inlined,
	whether we are certain or not that this will actually be the definition used at
	runtime. Under -analyzer-ipa=inlining, only "near-perfect" devirtualized calls
	are inlined*, and other dynamic calls are evaluated conservatively (as if no
	definition were available).

	* Currently, no Objective-C messages are not inlined under
	-analyzer-ipa=inlining, even if we are reasonably confident of the type of the
	receiver. We plan to enable this once we have tested our heuristics more
	thoroughly.

	The last option, -analyzer-ipa=dynamic-bifurcate, behaves similarly to
	"dynamic", but performs a conservative invalidation in the general virtual case
	in addition to inlining. The details of this are discussed below.

	Bifurcation
	-----------

	ExprEngine::BifurcateCall implements the -analyzer-ipa=dynamic-bifurcate
	mode.

	When a call is made on a region with imprecise dynamic type information
	(RuntimeDefinition::mayHaveOtherDefinitions() evaluates to TRUE), ExprEngine
	bifurcates the path and marks the MemRegion (derived from a RuntimeDefinition
	object) with a path-sensitive "mode" in the ProgramState.

	Currently, there are 2 modes:

	DynamicDispatchModeInlined - Models the case where the dynamic type information
	of the receiver (MemoryRegion) is assumed to be perfectly constrained so
	that a given definition of a method is expected to be the code actually
	called. When this mode is set, ExprEngine uses the Decl from
	RuntimeDefinition to inline any dynamically dispatched call sent to this
	receiver because the function definition is considered to be fully resolved.

	DynamicDispatchModeConservative - Models the case where the dynamic type
	information is assumed to be incorrect, for example, implies that the method
	definition is overriden in a subclass. In such cases, ExprEngine does not
	inline the methods sent to the receiver (MemoryRegion), even if a candidate
	definition is available. This mode is conservative about simulating the
	effects of a call.

	Going forward along the symbolic execution path, ExprEngine consults the mode
	of the receiver's MemRegion to make decisions on whether the calls should be
	inlined or not, which ensures that there is at most one split per region.

	At a high level, "bifurcation mode" allows for increased semantic coverage in
	cases where the parent method contains code which is only executed when the
	class is subclassed. The disadvantages of this mode are a (considerable?)
	performance hit and the possibility of false positives on the path where the
	conservative mode is used.

	Objective-C Message Heuristics
	------------------------------

	ExprEngine relies on a set of heuristics to partition the set of Objective-C
	method calls into those that require bifurcation and those that do not. Below
	are the cases when the DynamicTypeInfo of the object is considered precise
	(cannot be a subclass):

	- If the object was created with +alloc or +new and initialized with an -init
	method.

	- If the calls are property accesses using dot syntax. This is based on the
	assumption that children rarely override properties, or do so in an
	essentially compatible way.

	- If the class interface is declared inside the main source file. In this case
	it is unlikely that it will be subclassed.

	- If the method is not declared outside of main source file, either by the
	receiver's class or by any superclasses.

	C++ Inlining Caveats
	--------------------

	C++11 [class.cdtor]p4 describes how the vtable of an object is modified as it is
	being constructed or destructed; that is, the type of the object depends on
	which base constructors have been completed. This is tracked using
	DynamicTypeInfo in the DynamicTypePropagation checker.

	There are several limitations in the current implementation:

	- Temporaries are poorly modelled right now because we're not confident in the
	placement

	- 'new' is poorly modelled due to some nasty CFG/design issues. This is tracked
	in PR12014. 'delete' is not modelled at all.

	- Arrays of objects are modeled very poorly right now. ExprEngine currently
	only simualtes the first constructor and first destructor. Because of this,
	ExprEngine does not inline any constructors or destructors for arrays.

	CallEvent
	---------

	A CallEvent represents a specific call to a function, method, or other body of
	code. It is path-sensitive, containing both the current state (ProgramStateRef)
	and stack space (LocationContext), and provides uniform access to the argument
	values and return type of a call, no matter how the call is written in the
	source or what sort of code body is being invoked.

	NOTE: For those familiar with Cocoa, CallEvent is roughly equivalent to
	NSInvocation.

	CallEvent should be used whenever there is logic dealing with function calls
	that does not care how the call occurred.

	Examples include checking that arguments satisfy preconditions (such as
	__attribute__((nonnull))), and attempting to inline a call.

	CallEvents are reference-counted objects managed by a CallEventManager. While
	there is no inherent issue with persisting them (say, in a ProgramState's GDM),
	they are intended for short-lived use, and can be recreated from CFGElements or
	StackFrameContexts fairly easily.