benchmarks/wsim/README - platform/external/igt-gpu-tools - Git at Google

 Workload descriptor format
 ==========================

 ctx.engine.duration_us.dependency.wait,...
 <uint>.<str>.<uint>[-<uint>]|*.<int <= 0>[/<int <= 0>][...].<0|1>,...
 B.<uint>
 M.<uint>.<str>[|<str>]...
 P|S|X.<uint>.<int>
 d|p|s|t|q|a|T.<int>,...
 b.<uint>.<str>[|<str>].<str>
 f

 For duration a range can be given from which a random value will be picked
 before every submit. Since this and seqno management requires CPU access to
 objects, care needs to be taken in order to ensure the submit queue is deep
 enough these operations do not affect the execution speed unless that is
 desired.

 Additional workload steps are also supported:

  'd' - Adds a delay (in microseconds).
  'p' - Adds a delay relative to the start of previous loop so that the each loop
        starts execution with a given period.
  's' - Synchronises the pipeline to a batch relative to the step.
  't' - Throttle every n batches.
  'q' - Throttle to n max queue depth.
  'f' - Create a sync fence.
  'a' - Advance the previously created sync fence.
  'B' - Turn on context load balancing.
  'b' - Set up engine bonds.
  'M' - Set up engine map.
  'P' - Context priority.
  'S' - Context SSEU configuration.
  'T' - Terminate an infinite batch.
  'X' - Context preemption control.

 Engine ids: DEFAULT, RCS, BCS, VCS, VCS1, VCS2, VECS

 Example (leading spaces must not be present in the actual file):
 ----------------------------------------------------------------

   1.VCS1.3000.0.1
   1.RCS.500-1000.-1.0
   1.RCS.3700.0.0
   1.RCS.1000.-2.0
   1.VCS2.2300.-2.0
   1.RCS.4700.-1.0
   1.VCS2.600.-1.1
   p.16000

 The above workload described in human language works like this:

   1.   A batch is sent to the VCS1 engine which will be executing for 3ms on the
        GPU and userspace will wait until it is finished before proceeding.
   2-4. Now three batches are sent to RCS with durations of 0.5-1.5ms (random
        duration range), 3.7ms and 1ms respectively. The first batch has a data
        dependency on the preceding VCS1 batch, and the last of the group depends
        on the first from the group.
   5.   Now a 2.3ms batch is sent to VCS2, with a data dependency on the 3.7ms
        RCS batch.
   6.   This is followed by a 4.7ms RCS batch with a data dependency on the 2.3ms
        VCS2 batch.
   7.   Then a 0.6ms VCS2 batch is sent depending on the previous RCS one. In the
        same step the tool is told to wait for the batch completes before
        proceeding.
   8.   Finally the tool is told to wait long enough to ensure the next iteration
        starts 16ms after the previous one has started.

 When workload descriptors are provided on the command line, commas must be used
 instead of new lines.

 Multiple dependencies can be given separated by forward slashes.

 Example:

   1.VCS1.3000.0.1
   1.RCS.3700.0.0
   1.VCS2.2300.-1/-2.0

 I this case the last step has a data dependency on both first and second steps.

 Batch durations can also be specified as infinite by using the '*' in the
 duration field. Such batches must be ended by the terminate command ('T')
 otherwise they will cause a GPU hang to be reported.

 Sync (fd) fences
 ----------------

 Sync fences are also supported as dependencies.

 To use them put a "f<N>" token in the step dependecy list. N is this case the
 same relative step offset to the dependee batch, but instead of the data
 dependency an output fence will be emitted at the dependee step, and passed in
 as a dependency in the current step.

 Example:

   1.VCS1.3000.0.0
   1.RCS.500-1000.-1/f-1.0

 In this case the second step will have both a data dependency and a sync fence
 dependency on the previous step.

 Example:

   1.RCS.500-1000.0.0
   1.VCS1.3000.f-1.0
   1.VCS2.3000.f-2.0

 VCS1 and VCS2 batches will have a sync fence dependency on the RCS batch.

 Example:

   1.RCS.500-1000.0.0
   f
   2.VCS1.3000.f-1.0
   2.VCS2.3000.f-2.0
   1.RCS.500-1000.0.1
   a.-4
   s.-4
   s.-4

 VCS1 and VCS2 batches have an input sync fence dependecy on the standalone fence
 created at the second step. They are submitted ahead of time while still not
 runnable. When the second RCS batch completes the standalone fence is signaled
 which allows the two VCS batches to be executed. Finally we wait until the both
 VCS batches have completed before starting the (optional) next iteration.

 Submit fences
 -------------

 Submit fences are a type of input fence which are signalled when the originating
 batch buffer is submitted to the GPU. (In contrary to normal sync fences, which
 are signalled when completed.)

 Submit fences have the identical syntax as the sync fences with the lower-case
 's' being used to select them. Eg:

   1.RCS.500-1000.0.0
   1.VCS1.3000.s-1.0
   1.VCS2.3000.s-2.0

 Here VCS1 and VCS2 batches will only be submitted for executing once the RCS
 batch enters the GPU.

 Context priority
 ----------------

   P.1.-1
   1.RCS.1000.0.0
   P.2.1
   2.BCS.1000.-2.0

 Context 1 is marked as low priority (-1) and then a batch buffer is submitted
 against it. Context 2 is marked as high priority (1) and then a batch buffer
 is submitted against it which depends on the batch from context 1.

 Context priority command is executed at workload runtime and is valid until
 overriden by another (optional) same context priority change. Actual driver
 ioctls are executed only if the priority level has changed for the context.

 Context preemption control
 --------------------------

   X.1.0
   1.RCS.1000.0.0
   X.1.500
   1.RCS.1000.0.0

 Context 1 is marked as non-preemptable batches and a batch is sent against 1.
 The same context is then marked to have batches which can be preempted every
 500us and another batch is submitted.

 Same as with context priority, context preemption commands are valid until
 optionally overriden by another preemption control change on the same context.

 Engine maps
 -----------

 Engine maps are a per context feature which changes the way engine selection is
 done in the driver.

 Example:

   M.1.VCS1|VCS2

 This sets up context 1 with an engine map containing VCS1 and VCS2 engine.
 Submission to this context can now only reference these two engines.

 Engine maps can also be defined based on class like VCS.

 Example:

 M.1.VCS

 This sets up the engine map to all available VCS class engines.

 Context load balancing
 ----------------------

 Context load balancing (aka Virtual Engine) is an i915 feature where the driver
 will pick the best engine (most idle) to submit to given previously configured
 engine map.

 Example:

   B.1

 This enables load balancing for context number one.

 Engine bonds
 ------------

 Engine bonds are extensions on load balanced contexts. They allow expressing
 rules of engine selection between two co-operating contexts tied with submit
 fences. In other words, the rule expression is telling the driver: "If you pick
 this engine for context one, then you have to pick that engine for context two".

 Syntax is:
   b.<context>.<engine_list>.<master_engine>

 Engine list is a list of one or more sibling engines separated by a pipe
 character (eg. "VCS1|VCS2").

 There can be multiple bonds tied to the same context.

 Example:

   M.1.RCS|VECS
   B.1
   M.2.VCS1|VCS2
   B.2
   b.2.VCS1.RCS
   b.2.VCS2.VECS

 This tells the driver that if it picked RCS for context one, it has to pick VCS1
 for context two. And if it picked VECS for context one, it has to pick VCS1 for
 context two.

 If we extend the above example with more workload directives:

   1.DEFAULT.1000.0.0
   2.DEFAULT.1000.s-1.0

 We get to a fully functional example where two batch buffers are submitted in a
 load balanced fashion, telling the driver they should run simultaneously and
 that valid engine pairs are either RCS + VCS1 (for two contexts respectively),
 or VECS + VCS2.

 This can also be extended using sync fences to improve chances of the first
 submission not getting on the hardware after the second one. Second block would
 then look like:

   f
   1.DEFAULT.1000.f-1.0
   2.DEFAULT.1000.s-1.0
   a.-3

 Context SSEU configuration
 --------------------------

   S.1.1
   1.RCS.1000.0.0
   S.2.-1
   2.RCS.1000.0.0

 Context 1 is configured to run with one enabled slice (slice mask 1) and a batch
 is sumitted against it. Context 2 is configured to run with all slices (this is
 the default so the command could also be omitted) and a batch submitted against
 it.

 This shows the dynamic SSEU reconfiguration cost beween two contexts competing
 for the render engine.

 Slice mask of -1 has a special meaning of "all slices". Otherwise any integer
 can be specifying as the slice mask, but beware any apart from 1 and -1 can make
 the workload not portable between different GPUs.
	Workload descriptor format
	==========================

	ctx.engine.duration_us.dependency.wait,...
	<uint>.<str>.<uint>[-<uint>]\|*.<int <= 0>[/<int <= 0>][...].<0\|1>,...
	B.<uint>
	M.<uint>.<str>[\|<str>]...
	P\|S\|X.<uint>.<int>
	d\|p\|s\|t\|q\|a\|T.<int>,...
	b.<uint>.<str>[\|<str>].<str>
	f

	For duration a range can be given from which a random value will be picked
	before every submit. Since this and seqno management requires CPU access to
	objects, care needs to be taken in order to ensure the submit queue is deep
	enough these operations do not affect the execution speed unless that is
	desired.

	Additional workload steps are also supported:

	'd' - Adds a delay (in microseconds).
	'p' - Adds a delay relative to the start of previous loop so that the each loop
	starts execution with a given period.
	's' - Synchronises the pipeline to a batch relative to the step.
	't' - Throttle every n batches.
	'q' - Throttle to n max queue depth.
	'f' - Create a sync fence.
	'a' - Advance the previously created sync fence.
	'B' - Turn on context load balancing.
	'b' - Set up engine bonds.
	'M' - Set up engine map.
	'P' - Context priority.
	'S' - Context SSEU configuration.
	'T' - Terminate an infinite batch.
	'X' - Context preemption control.

	Engine ids: DEFAULT, RCS, BCS, VCS, VCS1, VCS2, VECS

	Example (leading spaces must not be present in the actual file):
	----------------------------------------------------------------

	1.VCS1.3000.0.1
	1.RCS.500-1000.-1.0
	1.RCS.3700.0.0
	1.RCS.1000.-2.0
	1.VCS2.2300.-2.0
	1.RCS.4700.-1.0
	1.VCS2.600.-1.1
	p.16000

	The above workload described in human language works like this:

	1. A batch is sent to the VCS1 engine which will be executing for 3ms on the
	GPU and userspace will wait until it is finished before proceeding.
	2-4. Now three batches are sent to RCS with durations of 0.5-1.5ms (random
	duration range), 3.7ms and 1ms respectively. The first batch has a data
	dependency on the preceding VCS1 batch, and the last of the group depends
	on the first from the group.
	5. Now a 2.3ms batch is sent to VCS2, with a data dependency on the 3.7ms
	RCS batch.
	6. This is followed by a 4.7ms RCS batch with a data dependency on the 2.3ms
	VCS2 batch.
	7. Then a 0.6ms VCS2 batch is sent depending on the previous RCS one. In the
	same step the tool is told to wait for the batch completes before
	proceeding.
	8. Finally the tool is told to wait long enough to ensure the next iteration
	starts 16ms after the previous one has started.

	When workload descriptors are provided on the command line, commas must be used
	instead of new lines.

	Multiple dependencies can be given separated by forward slashes.

	Example:

	1.VCS1.3000.0.1
	1.RCS.3700.0.0
	1.VCS2.2300.-1/-2.0

	I this case the last step has a data dependency on both first and second steps.

	Batch durations can also be specified as infinite by using the '*' in the
	duration field. Such batches must be ended by the terminate command ('T')
	otherwise they will cause a GPU hang to be reported.

	Sync (fd) fences
	----------------

	Sync fences are also supported as dependencies.

	To use them put a "f<N>" token in the step dependecy list. N is this case the
	same relative step offset to the dependee batch, but instead of the data
	dependency an output fence will be emitted at the dependee step, and passed in
	as a dependency in the current step.

	Example:

	1.VCS1.3000.0.0
	1.RCS.500-1000.-1/f-1.0

	In this case the second step will have both a data dependency and a sync fence
	dependency on the previous step.

	Example:

	1.RCS.500-1000.0.0
	1.VCS1.3000.f-1.0
	1.VCS2.3000.f-2.0

	VCS1 and VCS2 batches will have a sync fence dependency on the RCS batch.

	Example:

	1.RCS.500-1000.0.0
	f
	2.VCS1.3000.f-1.0
	2.VCS2.3000.f-2.0
	1.RCS.500-1000.0.1
	a.-4
	s.-4
	s.-4

	VCS1 and VCS2 batches have an input sync fence dependecy on the standalone fence
	created at the second step. They are submitted ahead of time while still not
	runnable. When the second RCS batch completes the standalone fence is signaled
	which allows the two VCS batches to be executed. Finally we wait until the both
	VCS batches have completed before starting the (optional) next iteration.

	Submit fences
	-------------

	Submit fences are a type of input fence which are signalled when the originating
	batch buffer is submitted to the GPU. (In contrary to normal sync fences, which
	are signalled when completed.)

	Submit fences have the identical syntax as the sync fences with the lower-case
	's' being used to select them. Eg:

	1.RCS.500-1000.0.0
	1.VCS1.3000.s-1.0
	1.VCS2.3000.s-2.0

	Here VCS1 and VCS2 batches will only be submitted for executing once the RCS
	batch enters the GPU.

	Context priority
	----------------

	P.1.-1
	1.RCS.1000.0.0
	P.2.1
	2.BCS.1000.-2.0

	Context 1 is marked as low priority (-1) and then a batch buffer is submitted
	against it. Context 2 is marked as high priority (1) and then a batch buffer
	is submitted against it which depends on the batch from context 1.

	Context priority command is executed at workload runtime and is valid until
	overriden by another (optional) same context priority change. Actual driver
	ioctls are executed only if the priority level has changed for the context.

	Context preemption control
	--------------------------

	X.1.0
	1.RCS.1000.0.0
	X.1.500
	1.RCS.1000.0.0

	Context 1 is marked as non-preemptable batches and a batch is sent against 1.
	The same context is then marked to have batches which can be preempted every
	500us and another batch is submitted.

	Same as with context priority, context preemption commands are valid until
	optionally overriden by another preemption control change on the same context.

	Engine maps
	-----------

	Engine maps are a per context feature which changes the way engine selection is
	done in the driver.

	Example:

	M.1.VCS1\|VCS2

	This sets up context 1 with an engine map containing VCS1 and VCS2 engine.
	Submission to this context can now only reference these two engines.

	Engine maps can also be defined based on class like VCS.

	Example:

	M.1.VCS

	This sets up the engine map to all available VCS class engines.

	Context load balancing
	----------------------

	Context load balancing (aka Virtual Engine) is an i915 feature where the driver
	will pick the best engine (most idle) to submit to given previously configured
	engine map.

	Example:

	B.1

	This enables load balancing for context number one.

	Engine bonds
	------------

	Engine bonds are extensions on load balanced contexts. They allow expressing
	rules of engine selection between two co-operating contexts tied with submit
	fences. In other words, the rule expression is telling the driver: "If you pick
	this engine for context one, then you have to pick that engine for context two".

	Syntax is:
	b.<context>.<engine_list>.<master_engine>

	Engine list is a list of one or more sibling engines separated by a pipe
	character (eg. "VCS1\|VCS2").

	There can be multiple bonds tied to the same context.

	Example:

	M.1.RCS\|VECS
	B.1
	M.2.VCS1\|VCS2
	B.2
	b.2.VCS1.RCS
	b.2.VCS2.VECS

	This tells the driver that if it picked RCS for context one, it has to pick VCS1
	for context two. And if it picked VECS for context one, it has to pick VCS1 for
	context two.

	If we extend the above example with more workload directives:

	1.DEFAULT.1000.0.0
	2.DEFAULT.1000.s-1.0

	We get to a fully functional example where two batch buffers are submitted in a
	load balanced fashion, telling the driver they should run simultaneously and
	that valid engine pairs are either RCS + VCS1 (for two contexts respectively),
	or VECS + VCS2.

	This can also be extended using sync fences to improve chances of the first
	submission not getting on the hardware after the second one. Second block would
	then look like:

	f
	1.DEFAULT.1000.f-1.0
	2.DEFAULT.1000.s-1.0
	a.-3

	Context SSEU configuration
	--------------------------

	S.1.1
	1.RCS.1000.0.0
	S.2.-1
	2.RCS.1000.0.0

	Context 1 is configured to run with one enabled slice (slice mask 1) and a batch
	is sumitted against it. Context 2 is configured to run with all slices (this is
	the default so the command could also be omitted) and a batch submitted against
	it.

	This shows the dynamic SSEU reconfiguration cost beween two contexts competing
	for the render engine.

	Slice mask of -1 has a special meaning of "all slices". Otherwise any integer
	can be specifying as the slice mask, but beware any apart from 1 and -1 can make
	the workload not portable between different GPUs.