tools/systrace_parser/contract-between-code-and-parser.txt - platform/packages/modules/NeuralNetworks - Git at Google

 Contract between NNAPI systrace code and parser
 ===============================================

 This text files documents how tracing in the NNAPI manifests in systrace output
 and how that output is interpreted by the systrace parser.

 Please view in a 160 character window.

 Special cases
 -------------

 - Execution is exposed as an asynchronous event from the runtime. Time taken
   by the runtime is calculated as the time between start of
   ANeuralNetworksExecution_startCompute and end of
   ANeuralNetworksEvent_wait. This special case is not reflected in the cases
   presented below.

 Notation
 --------
 - ...: elided code
 - t_m_w: tracing_mark_write
 - tX: timestamps
 - T1, T2: thread ids


 Cases for the parser
 ====================

 Source code                                    Systrace                                                   Interpretation for timing statistics - all
                                                                                                           times are wallclock
 ------------------------------------------------------------------------------------------------------------------------------------------------------

 *Baseline case*

 ... funcP(...) {                               t0: t_m_w:B|T1|[NN_LR_PP]funcP                             Add (t1-t0) to total time spent in Layer
   NNTRACE_RT(NNTRACE_PHASE_PREPARATION,        t1: t_m_w:E|T1                                             Runtime, Phase Preparation
              "funcP);
   ...
 }

 ------------------------------------------------------------------------------------------------------------------------------------------------------

 *Local call to other layer*

 ... funcA1(...) {                              t0:   t_m_w:B|T1|[NN_LA_PP]funcE1                          Add (t3-t0) to total time spent in Layer
   NNTRACE_APP(NNTRACE_PHASE_PREPARATION,       t1:   t_m_w:B|T1|[NN_LR_PP]funcC1                          Application, Phase Preparation
               "funcA1);                        t2:   t_m_w:E|T1
   ... funcR1(...);                             t3:   t_m_w:E|T1                                           Add (t2-t1) to total time spent in Layer
   ...                                                                                                     Runtime, Phase Preparation
 }
 ... funcR1(...) {                                                                                         Note: Self-time of Layer Application,
   NNTRACE_RT(NNTRACE_PHASE_PREPARATION,                                                                   Phase Preparation will be calculated as
              "funcR1"); ...                                                                               total time in Layer Application - total time
 }                                                                                                         in Layer Runtime

                                                                                                           Note: These can be nested as per rows
                                                                                                           below ("Switch phase ...", "Subphases ...",
                                                                                                           "Additional detail...")

                                                                                                           Note: If the called function specifies a
                                                                                                           phase that is not supposed to be nested,
                                                                                                           the parser will emit a diagnostic.

 ------------------------------------------------------------------------------------------------------------------------------------------------------

 *Switch phase during function*

 ... funcC1(...) {                              t0:   t_m_w:B|T1|[NN_LC_PTR]funcC1                         Add (t1-t0) to total time spent in Layer
   NNTRACE_TRANS("funcC1");                     t1:   t_m_w:B|T1|[SW][NN_LC_PCO]funcC1                     CPU, Phase Transformation
   ...                                          t2:   t_m_w:E|T1
   NNTRACE_COMP_SWITCH("funcC1");               t3:   t_m_w:E|T1                                           Add (t2-t1) to total time spent in Layer
   ...                                                                                                     CPU, Phase Computation
 }
                                                                                                           (t3-t2 treated as negligible - only the
                                                                                                           destructors of objects created between
                                                                                                           the tracepoints)

 ------------------------------------------------------------------------------------------------------------------------------------------------------

 *Subphases of execution*

 ... funcR2(...) {                              t0:   t_m_w:B|T1|[NN_LR_PE]funcR2                          Add (t2-t1) to total time spent in Layer
   NNTRACE_RT(NNTRACE_PHASE_EXECUTION,          t1:   t_m_w:B|T1|[NN_LC_PCO]funcC2                         CPU, (sub)Phase Computation and to
              "funcR2);                         t2:   t_m_w:E|T1                                           total time in Phase Execution
   ... funcC2(...);                             t3:   t_m_w:E|T1
   ...                                                                                                     Add (t3-t0) to total time spent in Layer
 }                                                                                                         Runtime, Phase Execution
 ... funcC2(...) {
   NNTRACE_COMP("funcC2");
   ...
 }

 ------------------------------------------------------------------------------------------------------------------------------------------------------

 *Additional detail in the same layer*

 ... funcR3(...) {                              t0:   t_m_w:B|T1|[NN_LR_PE]funcR3                          Add (t3-t0) to total time spent in Layer
   NNTRACE_RT(NNTRACE_PHASE_EXECUTION,          t1:   t_m_w:B|T1|[NN_LR_PE]funcR4                          Runtime, Phase Execution
              "funcR3);                         t2:   t_m_w:E|T1
   ... funcR4(...);                             t3:   t_m_w:E|T1                                           Note: funcR4 will be visible in the systrace
   ...                                                                                                     visualization
 }
 ... funcR4(...) {
   NNTRACE_RT(NNTRACE_PHASE_EXECUTION,
              "funcR4");
   ...
 }


 ------------------------------------------------------------------------------------------------------------------------------------------------------

 *Synchronous IPC call*

 ... funcR5(...) {                              t0:   t_m_w:B|T1|[NN_LR_PC]funcR5                          Add (t5-t0) - (t4-r1) to total time spent in
   NNTRACE_RT(NNTRACE_PHASE_COMPILATION,        t1:   t_m_w:B|T1|[NN_LI_PI]getCapabilities                 Layer Runtime, Phase Compilation; see
              "funcR5");                        t2:   t_m_w:B|T1|HIDL::IDevice::getCapabilities::client    "Onetime initialization code".
   ... device->getCapabilities()                t3:   t_m_w:E|T1
   ...                                          t4:   t_m_w:E|T1                                           Add (t4-t1) to total time spent in Layer
 }                                              t5:   t_m_w:E|T1                                           IPC, Phase Initialization
 ... VersionedIDevice::getCapabilities(...) {
    NTRACE_FULL(NNTRACE_LAYER_IPC,                                                                         Note: Self-time of Layer Runtime, Phase
                NNTRACE_PHASE_COMPILATION,                                                                 Compilation will be calculated as total
               "getCapabilities");                                                                         time in Layer Runtime - total time in Layer
                                                                                                           IPC
 }
                                                                                                           Note: Tracepoints are needed for the
                                                                                                           client IPC calls. The HIDL tracing isn't
                                                                                                           guaranteed to wait for the server - it just
                                                                                                           sends the transaction even if the call is
                                                                                                           synchronous.

 ------------------------------------------------------------------------------------------------------------------------------------------------------

 *Asynchronous IPC call that is synchronously waited for by the runtime*

 // Runtime code                                t0: t_m_w:B|T1|[NN_LI_PC]prepareModel                      Add (t10-t0) to total time spent in Layer
 ... funcRC(...) {                              t1: t_m_w:B|T1|HIDL::IDevice::prepareModel_1_1::client     IPC, Phase Compilation
    ...                                         t2: t_m_w:B|T2|HIDL::IDevice::prepareModel_1_1::server
    NTRACE_FULL(NNTRACE_LAYER_IPC,              t3: t_m_w:B|T2|[NN_LD_PC]SampleDriver::prepareModel        Add (t6-t2) to total time spent in Layer
                NNTRACE_PHASE_COMPILATION,      t4: t_m_w:B|T2|HIDL::IPreparedModelCallback::notify::clie  Driver, Phase Compilation. This includes
               "prapareModel");                 t5: t_m_w:E|T2                                             the generated HIDL stub code, which is
    ...                                         t6: t_m_w:E|T2                                             <0.05ms.
    device->prepareModel(...);                  t7: t_m_w:E|T2
    ...                                         t8: t_m_w:B|T1|HIDL::IPreparedModelCallback::notify::serv  Note: the HIDL trace rows are added by
    cb->wait();                                 t9: t_m_w:E|T1                                             the automatically generated proxy and
    ...                                         t10: t_m_w:E|T1                                            stub code. For the driver side, the
 }                                              t11: t_m_w:E|T1                                            mapping of the HIDL functions to layers
                                                                                                           and phases is done in the systrace
 // Driver code                                                                                            parser.
 ... SampleDriver::prepareModel(...) {
   NNTRACE_FULL(NNTRACE_LAYER_DRIVER,                                                                      Note: the SampleDriver::prepareModel is
                NNTRACE_PHASE_COMPILATION,                                                                 treated as additional detail for Layer
                "SampleDriver::prepareModel");                                                             Driver, Phase Compilation.
 }
                                                                                                           Note: the "t_m_w" output of
                                                                                                           systrace uses thread ids, so that the call
                                                                                                           stack can be reconstructed. The systrace
                                                                                                           rows are also annotated with process ids.
                                                                                                           The parser uses the process ids to
                                                                                                           distinguish between the application
                                                                                                           process from the driver process (used for
                                                                                                           diagnostics and for distinguishing CPU
                                                                                                           fallback from sample driver).

                                                                                                           Note: the next row in this table gives more
                                                                                                           detail for prepareModel specifically

                                                                                                           Note: the driver-side HIDL traces get us
                                                                                                           the time spent in sample and hvx drivers.
                                                                                                           With a different driver threading model
                                                                                                           this may not be the case - future drivers
                                                                                                           should add manual tracing.

                                                                                                           TODO: attribute driver process IPC call
                                                                                                           (callback) overhead to IPC layer.

 ------------------------------------------------------------------------------------------------------------------------------------------------------

 *Subtracting time when nesting is violated*

 // Runtime code                                t0:   t_m_w:B|T1|[NN_LI_PC]prepareModel                    Add (t3 - t0) - (t2 - t1) to total time spent
 ... funcRC(...) {                              t1:   t_m_w:B|T1|[SUB][NN_LR_PC]VersionedIDevice::prepareM in Layer IPC, Phase compilation
    ...                                         t2:   t_m_w:E|T1
    NTRACE_FULL(NNTRACE_LAYER_IPC,              t3:   t_m_w:E|T1                                           Add (t2-t1) to total time spent in Layer
                NNTRACE_PHASE_COMPILATION,                                                                 Runtime, Phase compilation
               "prapareModel");
    ...
    device->prepareModel(...);
    ...
    cb->wait();
    ...
 }

 ... VersionedIDevice::prepareModel(...) {
     // IPC work
     {
         NNTRACE_FULL_SUBTRACT(
             NNTRACE_LAYER_RUNTIME,
             NNTRACE_PHASE_COMPILATION,
             "VersionedIDevice::prepareModel");
         // Runtime work
     }
     // IPC work
 }

 ------------------------------------------------------------------------------------------------------------------------------------------------------

 *Onetime initialization code*

 ... funcR5(...) {                              t0:   t_m_w:B|T1|[NN_LR_PP]funcR5                          Add (t2-t1) to total time spent in Layer
   NNTRACE_RT(NNTRACE_PHASE_PREPARATION,        t1:   t_m_w:B|T1|[NN_LR_PI]funcI                           Runtime, Phase Initialization
              "runcR5);                         t2:   t_m_w:E|T1
   ... funcI(...);                              t3:   t_m_w:E|T1                                           Add (t3 - t0) - (t2 - t1) to total time spent
   ...                                                                                                     in Layer Runtime, Phase Preparation.
 }
 ... funcI(...) {
   NNTRACE_RT(NNTRACE_PHASE_INITIALIZATION,
              "funcI")
   ...
 }

 ------------------------------------------------------------------------------------------------------------------------------------------------------

 *Utility code*

 ... funcR6(...) {                              t0:   t_m_w:B|T1|[NN_LR_PP]funcR6                          Add (t3-t0) to total time spent in Layer
   NNTRACE_RT(NNTRACE_PHASE_PREPARATION,        t1:   t_m_w:B|T1|[NN_LU_PU]funcU                           Runtime, Phase Preparation
              "funcR6");                        t2:   t_m_w:E|T1
   ... funcU(...);                              t3:   t_m_w:E|T1                                           Note: the funcU is treated as additional
   ...                                                                                                     detail.
 }
 ... funcU(...) {
   NNTRACE_FULL(NNTRACE_LAYER_UTILITY,
                NNTRACE_PHASE_UNSPECIFIED,
                "funcU")
   ...
 }
	Contract between NNAPI systrace code and parser
	===============================================

	This text files documents how tracing in the NNAPI manifests in systrace output
	and how that output is interpreted by the systrace parser.

	Please view in a 160 character window.

	Special cases
	-------------

	- Execution is exposed as an asynchronous event from the runtime. Time taken
	by the runtime is calculated as the time between start of
	ANeuralNetworksExecution_startCompute and end of
	ANeuralNetworksEvent_wait. This special case is not reflected in the cases
	presented below.

	Notation
	--------
	- ...: elided code
	- t_m_w: tracing_mark_write
	- tX: timestamps
	- T1, T2: thread ids


	Cases for the parser
	====================

	Source code Systrace Interpretation for timing statistics - all
	times are wallclock
	------------------------------------------------------------------------------------------------------------------------------------------------------

	Baseline case

	... funcP(...) { t0: t_m_w:B\|T1\|[NN_LR_PP]funcP Add (t1-t0) to total time spent in Layer
	NNTRACE_RT(NNTRACE_PHASE_PREPARATION, t1: t_m_w:E\|T1 Runtime, Phase Preparation
	"funcP);
	...
	}

	------------------------------------------------------------------------------------------------------------------------------------------------------

	Local call to other layer

	... funcA1(...) { t0: t_m_w:B\|T1\|[NN_LA_PP]funcE1 Add (t3-t0) to total time spent in Layer
	NNTRACE_APP(NNTRACE_PHASE_PREPARATION, t1: t_m_w:B\|T1\|[NN_LR_PP]funcC1 Application, Phase Preparation
	"funcA1); t2: t_m_w:E\|T1
	... funcR1(...); t3: t_m_w:E\|T1 Add (t2-t1) to total time spent in Layer
	... Runtime, Phase Preparation
	}
	... funcR1(...) { Note: Self-time of Layer Application,
	NNTRACE_RT(NNTRACE_PHASE_PREPARATION, Phase Preparation will be calculated as
	"funcR1"); ... total time in Layer Application - total time
	} in Layer Runtime

	Note: These can be nested as per rows
	below ("Switch phase ...", "Subphases ...",
	"Additional detail...")

	Note: If the called function specifies a
	phase that is not supposed to be nested,
	the parser will emit a diagnostic.

	------------------------------------------------------------------------------------------------------------------------------------------------------

	Switch phase during function

	... funcC1(...) { t0: t_m_w:B\|T1\|[NN_LC_PTR]funcC1 Add (t1-t0) to total time spent in Layer
	NNTRACE_TRANS("funcC1"); t1: t_m_w:B\|T1\|[SW][NN_LC_PCO]funcC1 CPU, Phase Transformation
	... t2: t_m_w:E\|T1
	NNTRACE_COMP_SWITCH("funcC1"); t3: t_m_w:E\|T1 Add (t2-t1) to total time spent in Layer
	... CPU, Phase Computation
	}
	(t3-t2 treated as negligible - only the
	destructors of objects created between
	the tracepoints)

	------------------------------------------------------------------------------------------------------------------------------------------------------

	Subphases of execution

	... funcR2(...) { t0: t_m_w:B\|T1\|[NN_LR_PE]funcR2 Add (t2-t1) to total time spent in Layer
	NNTRACE_RT(NNTRACE_PHASE_EXECUTION, t1: t_m_w:B\|T1\|[NN_LC_PCO]funcC2 CPU, (sub)Phase Computation and to
	"funcR2); t2: t_m_w:E\|T1 total time in Phase Execution
	... funcC2(...); t3: t_m_w:E\|T1
	... Add (t3-t0) to total time spent in Layer
	} Runtime, Phase Execution
	... funcC2(...) {
	NNTRACE_COMP("funcC2");
	...
	}

	------------------------------------------------------------------------------------------------------------------------------------------------------

	Additional detail in the same layer

	... funcR3(...) { t0: t_m_w:B\|T1\|[NN_LR_PE]funcR3 Add (t3-t0) to total time spent in Layer
	NNTRACE_RT(NNTRACE_PHASE_EXECUTION, t1: t_m_w:B\|T1\|[NN_LR_PE]funcR4 Runtime, Phase Execution
	"funcR3); t2: t_m_w:E\|T1
	... funcR4(...); t3: t_m_w:E\|T1 Note: funcR4 will be visible in the systrace
	... visualization
	}
	... funcR4(...) {
	NNTRACE_RT(NNTRACE_PHASE_EXECUTION,
	"funcR4");
	...
	}


	------------------------------------------------------------------------------------------------------------------------------------------------------

	Synchronous IPC call

	... funcR5(...) { t0: t_m_w:B\|T1\|[NN_LR_PC]funcR5 Add (t5-t0) - (t4-r1) to total time spent in
	NNTRACE_RT(NNTRACE_PHASE_COMPILATION, t1: t_m_w:B\|T1\|[NN_LI_PI]getCapabilities Layer Runtime, Phase Compilation; see
	"funcR5"); t2: t_m_w:B\|T1\|HIDL::IDevice::getCapabilities::client "Onetime initialization code".
	... device->getCapabilities() t3: t_m_w:E\|T1
	... t4: t_m_w:E\|T1 Add (t4-t1) to total time spent in Layer
	} t5: t_m_w:E\|T1 IPC, Phase Initialization
	... VersionedIDevice::getCapabilities(...) {
	NTRACE_FULL(NNTRACE_LAYER_IPC, Note: Self-time of Layer Runtime, Phase
	NNTRACE_PHASE_COMPILATION, Compilation will be calculated as total
	"getCapabilities"); time in Layer Runtime - total time in Layer
	IPC
	}
	Note: Tracepoints are needed for the
	client IPC calls. The HIDL tracing isn't
	guaranteed to wait for the server - it just
	sends the transaction even if the call is
	synchronous.

	------------------------------------------------------------------------------------------------------------------------------------------------------

	Asynchronous IPC call that is synchronously waited for by the runtime

	// Runtime code t0: t_m_w:B\|T1\|[NN_LI_PC]prepareModel Add (t10-t0) to total time spent in Layer
	... funcRC(...) { t1: t_m_w:B\|T1\|HIDL::IDevice::prepareModel_1_1::client IPC, Phase Compilation
	... t2: t_m_w:B\|T2\|HIDL::IDevice::prepareModel_1_1::server
	NTRACE_FULL(NNTRACE_LAYER_IPC, t3: t_m_w:B\|T2\|[NN_LD_PC]SampleDriver::prepareModel Add (t6-t2) to total time spent in Layer
	NNTRACE_PHASE_COMPILATION, t4: t_m_w:B\|T2\|HIDL::IPreparedModelCallback::notify::clie Driver, Phase Compilation. This includes
	"prapareModel"); t5: t_m_w:E\|T2 the generated HIDL stub code, which is
	... t6: t_m_w:E\|T2 <0.05ms.
	device->prepareModel(...); t7: t_m_w:E\|T2
	... t8: t_m_w:B\|T1\|HIDL::IPreparedModelCallback::notify::serv Note: the HIDL trace rows are added by
	cb->wait(); t9: t_m_w:E\|T1 the automatically generated proxy and
	... t10: t_m_w:E\|T1 stub code. For the driver side, the
	} t11: t_m_w:E\|T1 mapping of the HIDL functions to layers
	and phases is done in the systrace
	// Driver code parser.
	... SampleDriver::prepareModel(...) {
	NNTRACE_FULL(NNTRACE_LAYER_DRIVER, Note: the SampleDriver::prepareModel is
	NNTRACE_PHASE_COMPILATION, treated as additional detail for Layer
	"SampleDriver::prepareModel"); Driver, Phase Compilation.
	}
	Note: the "t_m_w" output of
	systrace uses thread ids, so that the call
	stack can be reconstructed. The systrace
	rows are also annotated with process ids.
	The parser uses the process ids to
	distinguish between the application
	process from the driver process (used for
	diagnostics and for distinguishing CPU
	fallback from sample driver).

	Note: the next row in this table gives more
	detail for prepareModel specifically

	Note: the driver-side HIDL traces get us
	the time spent in sample and hvx drivers.
	With a different driver threading model
	this may not be the case - future drivers
	should add manual tracing.

	TODO: attribute driver process IPC call
	(callback) overhead to IPC layer.

	------------------------------------------------------------------------------------------------------------------------------------------------------

	Subtracting time when nesting is violated

	// Runtime code t0: t_m_w:B\|T1\|[NN_LI_PC]prepareModel Add (t3 - t0) - (t2 - t1) to total time spent
	... funcRC(...) { t1: t_m_w:B\|T1\|[SUB][NN_LR_PC]VersionedIDevice::prepareM in Layer IPC, Phase compilation
	... t2: t_m_w:E\|T1
	NTRACE_FULL(NNTRACE_LAYER_IPC, t3: t_m_w:E\|T1 Add (t2-t1) to total time spent in Layer
	NNTRACE_PHASE_COMPILATION, Runtime, Phase compilation
	"prapareModel");
	...
	device->prepareModel(...);
	...
	cb->wait();
	...
	}

	... VersionedIDevice::prepareModel(...) {
	// IPC work
	{
	NNTRACE_FULL_SUBTRACT(
	NNTRACE_LAYER_RUNTIME,
	NNTRACE_PHASE_COMPILATION,
	"VersionedIDevice::prepareModel");
	// Runtime work
	}
	// IPC work
	}

	------------------------------------------------------------------------------------------------------------------------------------------------------

	Onetime initialization code

	... funcR5(...) { t0: t_m_w:B\|T1\|[NN_LR_PP]funcR5 Add (t2-t1) to total time spent in Layer
	NNTRACE_RT(NNTRACE_PHASE_PREPARATION, t1: t_m_w:B\|T1\|[NN_LR_PI]funcI Runtime, Phase Initialization
	"runcR5); t2: t_m_w:E\|T1
	... funcI(...); t3: t_m_w:E\|T1 Add (t3 - t0) - (t2 - t1) to total time spent
	... in Layer Runtime, Phase Preparation.
	}
	... funcI(...) {
	NNTRACE_RT(NNTRACE_PHASE_INITIALIZATION,
	"funcI")
	...
	}

	------------------------------------------------------------------------------------------------------------------------------------------------------

	Utility code

	... funcR6(...) { t0: t_m_w:B\|T1\|[NN_LR_PP]funcR6 Add (t3-t0) to total time spent in Layer
	NNTRACE_RT(NNTRACE_PHASE_PREPARATION, t1: t_m_w:B\|T1\|[NN_LU_PU]funcU Runtime, Phase Preparation
	"funcR6"); t2: t_m_w:E\|T1
	... funcU(...); t3: t_m_w:E\|T1 Note: the funcU is treated as additional
	... detail.
	}
	... funcU(...) {
	NNTRACE_FULL(NNTRACE_LAYER_UTILITY,
	NNTRACE_PHASE_UNSPECIFIED,
	"funcU")
	...
	}