| // Copyright (c) 2015-2018 Khronos Group. This work is licensed under a |
| // Creative Commons Attribution 4.0 International License; see |
| // http://creativecommons.org/licenses/by/4.0/ |
| |
| [[shaders]] |
| = Shaders |
| |
| A shader specifies programmable operations that execute for each vertex, |
| control point, tessellated vertex, primitive, fragment, or workgroup in the |
| corresponding stage(s) of the graphics and compute pipelines. |
| |
| Graphics pipelines include vertex shader execution as a result of |
| <<drawing,primitive assembly>>, followed, if enabled, by tessellation |
| control and evaluation shaders operating on |
| <<drawing-primitive-topologies-patches,patches>>, geometry shaders, if |
| enabled, operating on primitives, and fragment shaders, if present, |
| operating on fragments generated by <<primsrast,Rasterization>>. |
| In this specification, vertex, tessellation control, tessellation evaluation |
| and geometry shaders are collectively referred to as vertex processing |
| stages and occur in the logical pipeline before rasterization. |
| The fragment shader occurs logically after rasterization. |
| |
| Only the compute shader stage is included in a compute pipeline. |
| Compute shaders operate on compute invocations in a workgroup. |
| |
| Shaders can: read from input variables, and read from and write to output |
| variables. |
| Input and output variables can: be used to transfer data between shader |
| stages, or to allow the shader to interact with values that exist in the |
| execution environment. |
| Similarly, the execution environment provides constants that describe |
| capabilities. |
| |
| Shader variables are associated with execution environment-provided inputs |
| and outputs using _built-in_ decorations in the shader. |
| The available decorations for each stage are documented in the following |
| subsections. |
| |
| |
| [[shader-modules]] |
| == Shader Modules |
| |
| [open,refpage='VkShaderModule',desc='Opaque handle to a shader module object',type='handles'] |
| -- |
| |
| _Shader modules_ contain _shader code_ and one or more entry points. |
| Shaders are selected from a shader module by specifying an entry point as |
| part of <<pipelines,pipeline>> creation. |
| The stages of a pipeline can: use shaders that come from different modules. |
| The shader code defining a shader module must: be in the SPIR-V format, as |
| described by the <<spirvenv,Vulkan Environment for SPIR-V>> appendix. |
| |
| Shader modules are represented by sname:VkShaderModule handles: |
| |
| include::../api/handles/VkShaderModule.txt[] |
| |
| -- |
| |
| [open,refpage='vkCreateShaderModule',desc='Creates a new shader module object',type='protos'] |
| -- |
| |
| To create a shader module, call: |
| |
| include::../api/protos/vkCreateShaderModule.txt[] |
| |
| * pname:device is the logical device that creates the shader module. |
| * pname:pCreateInfo is a pointer to an instance of the |
| sname:VkShaderModuleCreateInfo structure. |
| * pname:pAllocator controls host memory allocation as described in the |
| <<memory-allocation, Memory Allocation>> chapter. |
| * pname:pShaderModule points to a slink:VkShaderModule handle in which the |
| resulting shader module object is returned. |
| |
| Once a shader module has been created, any entry points it contains can: be |
| used in pipeline shader stages as described in <<pipelines-compute,Compute |
| Pipelines>> and <<pipelines-graphics,Graphics Pipelines>>. |
| |
| ifdef::VK_NV_glsl_shader[] |
| If the shader stage fails to compile ename:VK_ERROR_INVALID_SHADER_NV will |
| be generated and the compile log will be reported back to the application by |
| `<<VK_EXT_debug_report>>` if enabled. |
| endif::VK_NV_glsl_shader[] |
| |
| include::../validity/protos/vkCreateShaderModule.txt[] |
| -- |
| |
| [open,refpage='VkShaderModuleCreateInfo',desc='Structure specifying parameters of a newly created shader module',type='structs'] |
| -- |
| |
| The sname:VkShaderModuleCreateInfo structure is defined as: |
| |
| include::../api/structs/VkShaderModuleCreateInfo.txt[] |
| |
| * pname:sType is the type of this structure. |
| * pname:pNext is `NULL` or a pointer to an extension-specific structure. |
| * pname:flags is reserved for future use. |
| * pname:codeSize is the size, in bytes, of the code pointed to by |
| pname:pCode. |
| * pname:pCode points to code that is used to create the shader module. |
| The type and format of the code is determined from the content of the |
| memory addressed by pname:pCode. |
| |
| .Valid Usage |
| **** |
| * [[VUID-VkShaderModuleCreateInfo-codeSize-01085]] |
| pname:codeSize must: be greater than 0 |
| ifndef::VK_NV_glsl_shader[] |
| * [[VUID-VkShaderModuleCreateInfo-codeSize-01086]] |
| pname:codeSize must: be a multiple of 4 |
| * [[VUID-VkShaderModuleCreateInfo-pCode-01087]] |
| pname:pCode must: point to valid SPIR-V code, formatted and packed as |
| described by the <<spirv-spec,Khronos SPIR-V Specification>> |
| * [[VUID-VkShaderModuleCreateInfo-pCode-01088]] |
| pname:pCode must: adhere to the validation rules described by the |
| <<spirvenv-module-validation, Validation Rules within a Module>> section |
| of the <<spirvenv-capabilities,SPIR-V Environment>> appendix |
| endif::VK_NV_glsl_shader[] |
| ifdef::VK_NV_glsl_shader[] |
| * [[VUID-VkShaderModuleCreateInfo-pCode-01376]] |
| If pname:pCode points to SPIR-V code, pname:codeSize must: be a multiple |
| of 4 |
| * [[VUID-VkShaderModuleCreateInfo-pCode-01377]] |
| pname:pCode must: point to either valid SPIR-V code, formatted and |
| packed as described by the <<spirv-spec,Khronos SPIR-V Specification>> |
| or valid GLSL code which must: be written to the +GL_KHR_vulkan_glsl+ |
| extension specification |
| * [[VUID-VkShaderModuleCreateInfo-pCode-01378]] |
| If pname:pCode points to SPIR-V code, that code must: adhere to the |
| validation rules described by the <<spirvenv-module-validation, |
| Validation Rules within a Module>> section of the |
| <<spirvenv-capabilities,SPIR-V Environment>> appendix |
| * [[VUID-VkShaderModuleCreateInfo-pCode-01379]] |
| If pname:pCode points to GLSL code, it must: be valid GLSL code written |
| to the +GL_KHR_vulkan_glsl+ GLSL extension specification |
| endif::VK_NV_glsl_shader[] |
| * [[VUID-VkShaderModuleCreateInfo-pCode-01089]] |
| pname:pCode must: declare the code:Shader capability for SPIR-V code |
| * [[VUID-VkShaderModuleCreateInfo-pCode-01090]] |
| pname:pCode must: not declare any capability that is not supported by |
| the API, as described by the <<spirvenv-module-validation, |
| Capabilities>> section of the <<spirvenv-capabilities,SPIR-V |
| Environment>> appendix |
| * [[VUID-VkShaderModuleCreateInfo-pCode-01091]] |
| If pname:pCode declares any of the capabilities listed as optional: in |
| the <<spirvenv-capabilities-table,SPIR-V Environment>> appendix, the |
| corresponding feature(s) must: be enabled. |
| **** |
| |
| include::../validity/structs/VkShaderModuleCreateInfo.txt[] |
| -- |
| |
| [open,refpage='VkShaderModuleCreateFlags',desc='Reserved for future use',type='enums'] |
| -- |
| include::../api/flags/VkShaderModuleCreateFlags.txt[] |
| |
| sname:VkShaderModuleCreateFlags is a bitmask type for setting a mask, but is |
| currently reserved for future use. |
| -- |
| |
| ifdef::VK_EXT_validation_cache[] |
| include::VK_EXT_validation_cache/shader-module-validation-cache.txt[] |
| endif::VK_EXT_validation_cache[] |
| |
| |
| [open,refpage='vkDestroyShaderModule',desc='Destroy a shader module module',type='protos'] |
| -- |
| |
| To destroy a shader module, call: |
| |
| include::../api/protos/vkDestroyShaderModule.txt[] |
| |
| * pname:device is the logical device that destroys the shader module. |
| * pname:shaderModule is the handle of the shader module to destroy. |
| * pname:pAllocator controls host memory allocation as described in the |
| <<memory-allocation, Memory Allocation>> chapter. |
| |
| A shader module can: be destroyed while pipelines created using its shaders |
| are still in use. |
| |
| .Valid Usage |
| **** |
| * [[VUID-vkDestroyShaderModule-shaderModule-01092]] |
| If sname:VkAllocationCallbacks were provided when pname:shaderModule was |
| created, a compatible set of callbacks must: be provided here |
| * [[VUID-vkDestroyShaderModule-shaderModule-01093]] |
| If no sname:VkAllocationCallbacks were provided when pname:shaderModule |
| was created, pname:pAllocator must: be `NULL` |
| **** |
| |
| include::../validity/protos/vkDestroyShaderModule.txt[] |
| -- |
| |
| |
| [[shaders-execution]] |
| == Shader Execution |
| |
| At each stage of the pipeline, multiple invocations of a shader may: execute |
| simultaneously. |
| Further, invocations of a single shader produced as the result of different |
| commands may: execute simultaneously. |
| The relative execution order of invocations of the same shader type is |
| undefined. |
| Shader invocations may: complete in a different order than that in which the |
| primitives they originated from were drawn or dispatched by the application. |
| However, fragment shader outputs are written to attachments in |
| <<primrast-order,rasterization order>>. |
| |
| The relative order of invocations of different shader types is largely |
| undefined. |
| However, when invoking a shader whose inputs are generated from a previous |
| pipeline stage, the shader invocations from the previous stage are |
| guaranteed to have executed far enough to generate input values for all |
| required inputs. |
| |
| |
| [[shaders-execution-memory-ordering]] |
| == Shader Memory Access Ordering |
| |
| The order in which image or buffer memory is read or written by shaders is |
| largely undefined. |
| For some shader types (vertex, tessellation evaluation, and in some cases, |
| fragment), even the number of shader invocations that may: perform loads and |
| stores is undefined. |
| |
| In particular, the following rules apply: |
| |
| * <<shaders-vertex-execution,Vertex>> and |
| <<shaders-tessellation-evaluation-execution,tessellation evaluation>> |
| shaders will be invoked at least once for each unique vertex, as defined |
| in those sections. |
| * <<shaders-fragment-execution,Fragment>> shaders will be invoked zero or |
| more times, as defined in that section. |
| * The relative order of invocations of the same shader type are undefined. |
| A store issued by a shader when working on primitive B might complete |
| prior to a store for primitive A, even if primitive A is specified prior |
| to primitive B. This applies even to fragment shaders; while fragment |
| shader outputs are always written to the framebuffer in |
| <<primrast-order, rasterization order>>, stores executed by fragment |
| shader invocations are not. |
| * The relative order of invocations of different shader types is largely |
| undefined. |
| |
| [NOTE] |
| .Note |
| ==== |
| The above limitations on shader invocation order make some forms of |
| synchronization between shader invocations within a single set of primitives |
| unimplementable. |
| For example, having one invocation poll memory written by another invocation |
| assumes that the other invocation has been launched and will complete its |
| writes in finite time. |
| ==== |
| |
| Stores issued to different memory locations within a single shader |
| invocation may: not be visible to other invocations, or may: not become |
| visible in the order they were performed. |
| |
| The code:OpMemoryBarrier instruction can: be used to provide stronger |
| ordering of reads and writes performed by a single invocation. |
| code:OpMemoryBarrier guarantees that any memory transactions issued by the |
| shader invocation prior to the instruction complete prior to the memory |
| transactions issued after the instruction. |
| Memory barriers are needed for algorithms that require multiple invocations |
| to access the same memory and require the operations to be performed in a |
| partially-defined relative order. |
| For example, if one shader invocation does a series of writes, followed by |
| an code:OpMemoryBarrier instruction, followed by another write, then the |
| results of the series of writes before the barrier become visible to other |
| shader invocations at a time earlier or equal to when the results of the |
| final write become visible to those invocations. |
| In practice it means that another invocation that sees the results of the |
| final write would also see the previous writes. |
| Without the memory barrier, the final write may: be visible before the |
| previous writes. |
| |
| Writes that are the result of shader stores through a variable decorated |
| with code:Coherent automatically have available writes to the same buffer, |
| buffer view, or image view made visible to them, and are themselves |
| automatically made available to access by the same buffer, buffer view, or |
| image view. |
| Reads that are the result of shader loads through a variable decorated with |
| code:Coherent automatically have available writes to the same buffer, buffer |
| view, or image view made visible to them. |
| The order that coherent writes to different locations become available is |
| undefined, unless enforced by a memory barrier instruction or other memory |
| dependency. |
| |
| [NOTE] |
| .Note |
| ==== |
| Explicit memory dependencies must: still be used to guarantee availability |
| and visibility for access via other buffers, buffer views, or image views. |
| ==== |
| |
| The built-in atomic memory transaction instructions can: be used to read and |
| write a given memory address atomically. |
| While built-in atomic functions issued by multiple shader invocations are |
| executed in undefined order relative to each other, these functions perform |
| both a read and a write of a memory address and guarantee that no other |
| memory transaction will write to the underlying memory between the read and |
| write. |
| Atomic operations ensure automatic availability and visibility for writes |
| and reads in the same way as those to code:Coherent variables. |
| |
| [NOTE] |
| .Note |
| ==== |
| Memory accesses performed on different resource descriptors with the same |
| memory backing may: not be well-defined even with the code:Coherent |
| decoration or via atomics, due to things such as image layouts or ownership |
| of the resource - as described in the <<synchronization, Synchronization and |
| Cache Control>> chapter. |
| ==== |
| |
| [NOTE] |
| .Note |
| ==== |
| Atomics allow shaders to use shared global addresses for mutual exclusion or |
| as counters, among other uses. |
| ==== |
| |
| |
| [[shaders-inputs]] |
| == Shader Inputs and Outputs |
| |
| Data is passed into and out of shaders using variables with input or output |
| storage class, respectively. |
| User-defined inputs and outputs are connected between stages by matching |
| their code:Location decorations. |
| Additionally, data can: be provided by or communicated to special functions |
| provided by the execution environment using code:BuiltIn decorations. |
| |
| In many cases, the same code:BuiltIn decoration can: be used in multiple |
| shader stages with similar meaning. |
| The specific behavior of variables decorated as code:BuiltIn is documented |
| in the following sections. |
| |
| |
| [[shaders-vertex]] |
| == Vertex Shaders |
| |
| Each vertex shader invocation operates on one vertex and its associated |
| <<fxvertex-attrib,vertex attribute>> data, and outputs one vertex and |
| associated data. |
| Graphics pipelines must: include a vertex shader, and the vertex shader |
| stage is always the first shader stage in the graphics pipeline. |
| |
| |
| [[shaders-vertex-execution]] |
| === Vertex Shader Execution |
| |
| A vertex shader must: be executed at least once for each vertex specified by |
| a draw command. |
| ifdef::VK_VERSION_1_1,VK_KHR_multiview[] |
| If the subpass includes multiple views in its view mask, the shader may: be |
| invoked separately for each view. |
| endif::VK_VERSION_1_1,VK_KHR_multiview[] |
| During execution, the shader is presented with the index of the vertex and |
| instance for which it has been invoked. |
| Input variables declared in the vertex shader are filled by the |
| implementation with the values of vertex attributes associated with the |
| invocation being executed. |
| |
| If the same vertex is specified multiple times in a draw command (e.g. by |
| including the same index value multiple times in an index buffer) the |
| implementation may: reuse the results of vertex shading if it can statically |
| determine that the vertex shader invocations will produce identical results. |
| |
| [NOTE] |
| .Note |
| ==== |
| It is implementation-dependent when and if results of vertex shading are |
| reused, and thus how many times the vertex shader will be executed. |
| This is true also if the vertex shader contains stores or atomic operations |
| (see <<features-features-vertexPipelineStoresAndAtomics, |
| pname:vertexPipelineStoresAndAtomics>>). |
| ==== |
| |
| |
| [[shaders-tessellation-control]] |
| == Tessellation Control Shaders |
| |
| The tessellation control shader is used to read an input patch provided by |
| the application and to produce an output patch. |
| Each tessellation control shader invocation operates on an input patch |
| (after all control points in the patch are processed by a vertex shader) and |
| its associated data, and outputs a single control point of the output patch |
| and its associated data, and can: also output additional per-patch data. |
| The input patch is sized according to the pname:patchControlPoints member of |
| slink:VkPipelineTessellationStateCreateInfo, as part of input assembly. |
| The size of the output patch is controlled by the code:OpExecutionMode |
| code:OutputVertices specified in the tessellation control or tessellation |
| evaluation shaders, which must: be specified in at least one of the shaders. |
| The size of the input and output patches must: each be greater than zero and |
| less than or equal to |
| sname:VkPhysicalDeviceLimits::pname:maxTessellationPatchSize. |
| |
| |
| [[shaders-tessellation-control-execution]] |
| === Tessellation Control Shader Execution |
| |
| A tessellation control shader is invoked at least once for each _output_ |
| vertex in a patch. |
| ifdef::VK_VERSION_1_1,VK_KHR_multiview[] |
| If the subpass includes multiple views in its view mask, the shader may: be |
| invoked separately for each view. |
| endif::VK_VERSION_1_1,VK_KHR_multiview[] |
| |
| Inputs to the tessellation control shader are generated by the vertex |
| shader. |
| Each invocation of the tessellation control shader can: read the attributes |
| of any incoming vertices and their associated data. |
| The invocations corresponding to a given patch execute logically in |
| parallel, with undefined relative execution order. |
| However, the code:OpControlBarrier instruction can: be used to provide |
| limited control of the execution order by synchronizing invocations within a |
| patch, effectively dividing tessellation control shader execution into a set |
| of phases. |
| Tessellation control shaders will read undefined values if one invocation |
| reads a per-vertex or per-patch attribute written by another invocation at |
| any point during the same phase, or if two invocations attempt to write |
| different values to the same per-patch output in a single phase. |
| |
| |
| [[shaders-tessellation-evaluation]] |
| == Tessellation Evaluation Shaders |
| |
| The Tessellation Evaluation Shader operates on an input patch of control |
| points and their associated data, and a single input barycentric coordinate |
| indicating the invocation's relative position within the subdivided patch, |
| and outputs a single vertex and its associated data. |
| |
| |
| [[shaders-tessellation-evaluation-execution]] |
| === Tessellation Evaluation Shader Execution |
| |
| A tessellation evaluation shader is invoked at least once for each unique |
| vertex generated by the tessellator. |
| ifdef::VK_VERSION_1_1,VK_KHR_multiview[] |
| If the subpass includes multiple views in its view mask, the shader may: be |
| invoked separately for each view. |
| endif::VK_VERSION_1_1,VK_KHR_multiview[] |
| |
| |
| [[shaders-geometry]] |
| == Geometry Shaders |
| |
| The geometry shader operates on a group of vertices and their associated |
| data assembled from a single input primitive, and emits zero or more output |
| primitives and the group of vertices and their associated data required for |
| each output primitive. |
| |
| |
| [[shaders-geometry-execution]] |
| === Geometry Shader Execution |
| |
| A geometry shader is invoked at least once for each primitive produced by |
| the tessellation stages, or at least once for each primitive generated by |
| <<drawing,primitive assembly>> when tessellation is not in use. |
| A shader can request that the geometry shader runs multiple |
| <<geometry-invocations, instances>>. |
| A geometry shader is invoked at least once for each instance. |
| ifdef::VK_VERSION_1_1,VK_KHR_multiview[] |
| If the subpass includes multiple views in its view mask, the shader may: be |
| invoked separately for each view. |
| endif::VK_VERSION_1_1,VK_KHR_multiview[] |
| |
| |
| [[shaders-fragment]] |
| == Fragment Shaders |
| |
| Fragment shaders are invoked as the result of rasterization in a graphics |
| pipeline. |
| Each fragment shader invocation operates on a single fragment and its |
| associated data. |
| With few exceptions, fragment shaders do not have access to any data |
| associated with other fragments and are considered to execute in isolation |
| of fragment shader invocations associated with other fragments. |
| |
| |
| [[shaders-fragment-execution]] |
| === Fragment Shader Execution |
| |
| For each fragment generated by rasterization, a fragment shader may: be |
| invoked. |
| A fragment shader must: not be invoked if the <<fragops-early,Early |
| Per-Fragment Tests>> cause it to have no coverage. |
| ifdef::VK_VERSION_1_1,VK_KHR_multiview[] |
| If the subpass includes multiple views in its view mask, the shader may: be |
| invoked separately for each view. |
| endif::VK_VERSION_1_1,VK_KHR_multiview[] |
| |
| Furthermore, if it is determined that a fragment generated as the result of |
| rasterizing a first primitive will have its outputs entirely overwritten by |
| a fragment generated as the result of rasterizing a second primitive in the |
| same subpass, and the fragment shader used for the fragment has no other |
| side effects, then the fragment shader may: not be executed for the fragment |
| from the first primitive. |
| |
| Relative ordering of execution of different fragment shader invocations is |
| not defined. |
| |
| When a primitive (partially or fully) covers a pixel, the number of times |
| the fragment shader is invoked is implementation-dependent, but must: obey |
| the following constraints: |
| |
| * Each covered sample is included in a single fragment shader invocation. |
| * When sample shading is not enabled, there is at least one fragment |
| shader invocation. |
| * When sample shading is enabled, the minimum number of fragment shader |
| invocations is as defined in <<primsrast-sampleshading,Sample Shading>>. |
| |
| When there is more than one fragment shader invocation per pixel, the |
| association of samples to invocations is implementation-dependent. |
| |
| In addition to the conditions outlined above for the invocation of a |
| fragment shader, a fragment shader invocation may: be produced as a _helper |
| invocation_. |
| A helper invocation is a fragment shader invocation that is created solely |
| for the purposes of evaluating derivatives for use in non-helper fragment |
| shader invocations. |
| Stores and atomics performed by helper invocations must: not have any effect |
| on memory, and values returned by atomic instructions in helper invocations |
| are undefined. |
| |
| |
| [[shaders-fragment-earlytest]] |
| === Early Fragment Tests |
| |
| An explicit control is provided to allow fragment shaders to enable early |
| fragment tests. |
| If the fragment shader specifies the code:EarlyFragmentTests |
| code:OpExecutionMode, the per-fragment tests described in |
| <<fragops-early-mode,Early Fragment Test Mode>> are performed prior to |
| fragment shader execution. |
| Otherwise, they are performed after fragment shader execution. |
| |
| ifdef::VK_EXT_post_depth_coverage[] |
| [[shaders-fragment-earlytest-postdepthcoverage]] |
| If the fragment shader additionally specifies the code:PostDepthCoverage |
| code:OpExecutionMode, the value of a variable decorated with the |
| <<interfaces-builtin-variables-samplemask,code:SampleMask>> built-in |
| reflects the coverage after the early fragment tests. |
| Otherwise, it reflects the coverage before the early fragment tests. |
| endif::VK_EXT_post_depth_coverage[] |
| |
| [[shaders-compute]] |
| == Compute Shaders |
| |
| Compute shaders are invoked via flink:vkCmdDispatch and |
| flink:vkCmdDispatchIndirect commands. |
| In general, they have access to similar resources as shader stages executing |
| as part of a graphics pipeline. |
| |
| Compute workloads are formed from groups of work items called workgroups and |
| processed by the compute shader in the current compute pipeline. |
| A workgroup is a collection of shader invocations that execute the same |
| shader, potentially in parallel. |
| Compute shaders execute in _global workgroups_ which are divided into a |
| number of _local workgroups_ with a size that can: be set by assigning a |
| value to the code:LocalSize execution mode or via an object decorated by the |
| code:WorkgroupSize decoration. |
| An invocation within a local workgroup can: share data with other members of |
| the local workgroup through shared variables and issue memory and control |
| flow barriers to synchronize with other members of the local workgroup. |
| |
| |
| [[shaders-interpolation-decorations]] |
| == Interpolation Decorations |
| |
| Interpolation decorations control the behavior of attribute interpolation in |
| the fragment shader stage. |
| Interpolation decorations can: be applied to code:Input storage class |
| variables in the fragment shader stage's interface, and control the |
| interpolation behavior of those variables. |
| |
| Inputs that could be interpolated can: be decorated by at most one of the |
| following decorations: |
| |
| * code:Flat: no interpolation |
| * code:NoPerspective: linear interpolation (for |
| <<line_linear_interpolation,lines>> and |
| <<triangle_linear_interpolation,polygons>>). |
| |
| Fragment input variables decorated with neither code:Flat nor |
| code:NoPerspective use perspective-correct interpolation (for |
| <<line_perspective_interpolation,lines>> and |
| <<triangle_perspective_interpolation,polygons>>). |
| |
| The presence of and type of interpolation is controlled by the above |
| interpolation decorations as well as the auxiliary decorations code:Centroid |
| and code:Sample. |
| |
| A variable decorated with code:Flat will not be interpolated. |
| Instead, it will have the same value for every fragment within a triangle. |
| This value will come from a single <<vertexpostproc-flatshading,provoking |
| vertex>>. |
| A variable decorated with code:Flat can: also be decorated with |
| code:Centroid or code:Sample, which will mean the same thing as decorating |
| it only as code:Flat. |
| |
| For fragment shader input variables decorated with neither code:Centroid nor |
| code:Sample, the assigned variable may: be interpolated anywhere within the |
| pixel and a single value may: be assigned to each sample within the pixel. |
| |
| code:Centroid and code:Sample can: be used to control the location and |
| frequency of the sampling of the decorated fragment shader input. |
| If a fragment shader input is decorated with code:Centroid, a single value |
| may: be assigned to that variable for all samples in the pixel, but that |
| value must: be interpolated to a location that lies in both the pixel and in |
| the primitive being rendered, including any of the pixel's samples covered |
| by the primitive. |
| Because the location at which the variable is interpolated may: be different |
| in neighboring pixels, and derivatives may: be computed by computing |
| differences between neighboring pixels, derivatives of centroid-sampled |
| inputs may: be less accurate than those for non-centroid interpolated |
| variables. |
| ifdef::VK_EXT_post_depth_coverage[] |
| The <<shaders-fragment-earlytest-postdepthcoverage,code:PostDepthCoverage>> |
| execution mode does not affect the determination of the centroid location. |
| endif::VK_EXT_post_depth_coverage[] |
| If a fragment shader input is decorated with code:Sample, a separate value |
| must: be assigned to that variable for each covered sample in the pixel, and |
| that value must: be sampled at the location of the individual sample. |
| When pname:rasterizationSamples is ename:VK_SAMPLE_COUNT_1_BIT, the pixel |
| center must: be used for code:Centroid, code:Sample, and undecorated |
| attribute interpolation. |
| |
| Fragment shader inputs that are signed or unsigned integers, integer |
| vectors, or any double-precision floating-point type must: be decorated with |
| code:Flat. |
| |
| ifdef::VK_AMD_shader_explicit_vertex_parameter[] |
| When the `<<VK_AMD_shader_explicit_vertex_parameter>>` device extension is |
| enabled inputs can: be also decorated with the code:CustomInterpAMD |
| interpolation decoration, including fragment shader inputs that are signed |
| or unsigned integers, integer vectors, or any double-precision |
| floating-point type. |
| Inputs decorated with code:CustomInterpAMD can: only be accessed by the |
| extended instruction code:InterpolateAtVertexAMD and allows accessing the |
| value of the input for individual vertices of the primitive. |
| endif::VK_AMD_shader_explicit_vertex_parameter[] |
| |
| |
| [[shaders-staticuse]] |
| == Static Use |
| |
| A SPIR-V module declares a global object in memory using the code:OpVariable |
| instruction, which results in a pointer code:x to that object. |
| A specific entry point in a SPIR-V module is said to _statically use_ that |
| object if that entry point's call tree contains a function that contains a |
| memory instruction or image instruction with code:x as an code:id operand. |
| See the "`Memory Instructions`" and "`Image Instructions`" subsections of |
| section 3 "`Binary Form`" of the SPIR-V specification for the complete list |
| of SPIR-V memory instructions. |
| |
| Static use is not used to control the behavior of variables with code:Input |
| and code:Output storage. |
| The effects of those variables are applied based only on whether they are |
| present in a shader entry point's interface. |
| |
| [[shaders-invocationgroups]] |
| == Invocation and Derivative Groups |
| |
| An _invocation group_ (see the subsection "`Control Flow`" of section 2 of |
| the SPIR-V specification) for a compute shader is the set of invocations in |
| a single local workgroup. |
| For graphics shaders, an invocation group is an implementation-dependent |
| subset of the set of shader invocations of a given shader stage which are |
| produced by a single drawing command. |
| For indirect drawing commands with pname:drawCount greater than one, |
| invocations from separate draws are in distinct invocation groups. |
| |
| [NOTE] |
| .Note |
| ==== |
| Because the partitioning of invocations into invocation groups is |
| implementation-dependent and not observable, applications generally need to |
| assume the worst case of all invocations in a draw belonging to a single |
| invocation group. |
| ==== |
| |
| A _derivative group_ (see the subsection "`Control Flow`" of section 2 of |
| the SPIR-V 1.00 Revision 4 specification) for a fragment shader is the set |
| of invocations generated by a single primitive (point, line, or triangle), |
| including any helper invocations generated by that primitive. |
| Derivatives are undefined for a sampled image instruction if the instruction |
| is in flow control that is not uniform across the derivative group. |
| |
| ifdef::VK_VERSION_1_1[] |
| [[shaders-subgroup]] |
| == Subgroups |
| |
| A _subgroup_ (see the subsection ``Control Flow'' of section 2 of the SPIR-V |
| 1.3 Revision 1 specification) is a set of invocations that can synchronize |
| and share data with each other efficiently. |
| An invocation group is partitioned into one or more subgroups. |
| |
| Subgroup operations are divided into various categories as described in |
| elink:VkSubgroupFeatureFlagBits. |
| |
| [[shaders-subgroup-basic]] |
| === Basic Subgroup Operations |
| |
| The basic subgroup operations allow two classes of functionality within |
| shaders |
| - elect and barrier. |
| Invocations within a subgroup can: choose a single invocation to perform |
| some task for the subgroup as a whole using elect. |
| Invocations within a subgroup can: perform a subgroup barrier to ensure the |
| ordering of execution or memory accesses within a subgroup. |
| Barriers can: be performed on buffer memory accesses, code:WorkgroupLocal |
| memory accesses, and image memory accesses to ensure that any results |
| written are visible by other invocations within the subgroup. |
| An code:OpControlBarrier can: also be used to perform a full execution |
| control barrier. |
| A full execution control barrier will ensure that each active invocation |
| within the subgroup reaches a point of execution before any are allowed to |
| continue. |
| |
| [[shaders-subgroup-vote]] |
| === Vote Subgroup Operations |
| |
| The vote subgroup operations allow invocations within a subgroup to compare |
| values across a subgroup. |
| The types of votes enabled are: |
| |
| * Do all active subgroup invocations agree that an expression is true? |
| * Do any active subgroup invocations evaluate an expression to true? |
| * Do all active subgroup invocations have the same value of an expression? |
| |
| [NOTE] |
| .Note |
| ==== |
| These operations are useful in combination with control flow in that they |
| allow for developers to check whether conditions match across the subgroup |
| and choose potentially faster code-paths in these cases. |
| ==== |
| |
| [[shaders-subgroup-arithmetic]] |
| === Arithmetic Subgroup Operations |
| |
| The arithmetic subgroup operations allow invocations to perform scan and |
| reduction operations across a subgroup. |
| For reduction operations, each invocation in a subgroup will obtain the same |
| result of these arithmetic operations applied across the subgroup. |
| For scan operations, each invocation in the subgroup will perform an |
| inclusive or exclusive scan, cumulatively applying the operation across the |
| invocations in a subgroup in linear order. |
| The operations supported are add, mul, min, max, and, or, xor. |
| |
| [[shaders-subgroup-ballot]] |
| === Ballot Subgroup Operations |
| |
| The ballot subgroup operations allow invocations to perform more complex |
| votes across the subgroup. |
| The ballot functionality allows all invocations within a subgroup to provide |
| a boolean value and get as a result what each invocation provided as their |
| boolean value. |
| The broadcast functionality allows values to be broadcast from an invocation |
| to all other invocations within the subgroup, given that the invocation to |
| be broadcast from is known at pipeline creation time. |
| |
| [[shaders-subgroup-shuffle]] |
| === Shuffle Subgroup Operations |
| |
| The shuffle subgroup operations allow invocations to read values from other |
| invocations within a subgroup. |
| |
| [[shaders-subgroup-shuffle-relative]] |
| === Shuffle Relative Subgroup Operations |
| |
| The shuffle relative subgroup operations allow invocations to read values |
| from other invocations within the subgroup relative to the current |
| invocation in the group. |
| The relative operations supported allow data to be shifted up and down |
| through the invocations within a subgroup. |
| |
| [[shaders-subgroup-clustered]] |
| === Clustered Subgroup Operations |
| |
| The clustered subgroup operations allow invocations to perform an operation |
| among partitions of a subgroup, such that the operation is only performed |
| within the subgroup invocations within a partition. |
| The partitions for clustered subgroup operations are consecutive |
| power-of-two size groups of invocations and the cluster size must: be known |
| at pipeline creation time. |
| The operations supported are add, mul, min, max, and, or, xor. |
| |
| [[shaders-subgroup-quad]] |
| === Quad Subgroup Operations |
| |
| The quad subgroup operations allow clusters of 4 invocations (a quad), to |
| share data efficiently with each other. |
| |
| ifdef::VK_NV_shader_subgroup_partitioned[] |
| |
| [[shaders-subgroup-partitioned]] |
| === Partitioned Subgroup Operations |
| |
| The partitioned subgroup operations allow invocations to perform an |
| operation among partitions of a subgroup, such that the operation is only |
| performed within the subgroup invocations within a partition. |
| The partitions for partitioned subgroup operations can: group the |
| invocations into arbitrary subsets and can: be computed at runtime. |
| The operations supported are add, mul, min, max, and, or, xor. |
| |
| endif::VK_NV_shader_subgroup_partitioned[] |
| |
| endif::VK_VERSION_1_1[] |
| |
| ifdef::VK_EXT_validation_cache[] |
| [[shaders-validation-cache]] |
| == Validation Cache |
| |
| [open,refpage='VkValidationCacheEXT',desc='Opaque handle to a validation cache object',type='handles'] |
| -- |
| |
| Validation cache objects allow the result of internal validation to be |
| reused, both within a single application run and between multiple runs. |
| Reuse within a single run is achieved by passing the same validation cache |
| object when creating supported Vulkan objects. |
| Reuse across runs of an application is achieved by retrieving validation |
| cache contents in one run of an application, saving the contents, and using |
| them to preinitialize a validation cache on a subsequent run. |
| The contents of the validation cache objects are managed by the validation |
| layers. |
| Applications can: manage the host memory consumed by a validation cache |
| object and control the amount of data retrieved from a validation cache |
| object. |
| |
| Validation cache objects are represented by sname:VkValidationCacheEXT |
| handles: |
| |
| include::../api/handles/VkValidationCacheEXT.txt[] |
| |
| -- |
| |
| [open,refpage='vkCreateValidationCacheEXT',desc='Creates a new validation cache',type='protos'] |
| -- |
| |
| To create validation cache objects, call: |
| |
| include::../api/protos/vkCreateValidationCacheEXT.txt[] |
| |
| * pname:device is the logical device that creates the validation cache |
| object. |
| * pname:pCreateInfo is a pointer to a slink:VkValidationCacheCreateInfoEXT |
| structure that contains the initial parameters for the validation cache |
| object. |
| * pname:pAllocator controls host memory allocation as described in the |
| <<memory-allocation, Memory Allocation>> chapter. |
| * pname:pValidationCache is a pointer to a slink:VkValidationCacheEXT |
| handle in which the resulting validation cache object is returned. |
| |
| [NOTE] |
| .Note |
| ==== |
| Applications can: track and manage the total host memory size of a |
| validation cache object using the pname:pAllocator. |
| Applications can: limit the amount of data retrieved from a validation cache |
| object in fname:vkGetValidationCacheDataEXT. |
| Implementations should: not internally limit the total number of entries |
| added to a validation cache object or the total host memory consumed. |
| ==== |
| |
| Once created, a validation cache can: be passed to the |
| fname:vkCreateShaderModule command as part of the |
| sname:VkShaderModuleCreateInfo pname:pNext chain. |
| If a sname:VkShaderModuleValidationCacheCreateInfoEXT object is part of the |
| sname:VkShaderModuleCreateInfo::pname:pNext chain, and its |
| pname:validationCache field is not dlink:VK_NULL_HANDLE, the implementation |
| will query it for possible reuse opportunities and update it with new |
| content. |
| The use of the validation cache object in these commands is internally |
| synchronized, and the same validation cache object can: be used in multiple |
| threads simultaneously. |
| |
| [NOTE] |
| .Note |
| ==== |
| Implementations should: make every effort to limit any critical sections to |
| the actual accesses to the cache, which is expected to be significantly |
| shorter than the duration of the fname:vkCreateShaderModule command. |
| ==== |
| |
| include::../validity/protos/vkCreateValidationCacheEXT.txt[] |
| -- |
| |
| [open,refpage='VkValidationCacheCreateInfoEXT',desc='Structure specifying parameters of a newly created validation cache',type='structs'] |
| -- |
| |
| The sname:VkValidationCacheCreateInfoEXT structure is defined as: |
| |
| include::../api/structs/VkValidationCacheCreateInfoEXT.txt[] |
| |
| * pname:sType is the type of this structure. |
| * pname:pNext is `NULL` or a pointer to an extension-specific structure. |
| * pname:flags is reserved for future use. |
| * pname:initialDataSize is the number of bytes in pname:pInitialData. |
| If pname:initialDataSize is zero, the validation cache will initially be |
| empty. |
| * pname:pInitialData is a pointer to previously retrieved validation cache |
| data. |
| If the validation cache data is incompatible (as defined below) with the |
| device, the validation cache will be initially empty. |
| If pname:initialDataSize is zero, pname:pInitialData is ignored. |
| |
| .Valid Usage |
| **** |
| * [[VUID-VkValidationCacheCreateInfoEXT-initialDataSize-01534]] |
| If pname:initialDataSize is not `0`, it must: be equal to the size of |
| pname:pInitialData, as returned by fname:vkGetValidationCacheDataEXT |
| when pname:pInitialData was originally retrieved |
| * [[VUID-VkValidationCacheCreateInfoEXT-initialDataSize-01535]] |
| If pname:initialDataSize is not `0`, pname:pInitialData must: have been |
| retrieved from a previous call to fname:vkGetValidationCacheDataEXT |
| **** |
| |
| include::../validity/structs/VkValidationCacheCreateInfoEXT.txt[] |
| -- |
| |
| [open,refpage='VkValidationCacheCreateFlagsEXT',desc='Reserved for future use',type='enums'] |
| -- |
| include::../api/flags/VkValidationCacheCreateFlagsEXT.txt[] |
| |
| sname:VkValidationCacheCreateFlagsEXT is a bitmask type for setting a mask, |
| but is currently reserved for future use. |
| -- |
| |
| [open,refpage='vkMergeValidationCachesEXT',desc='Combine the data stores of validation caches',type='protos'] |
| -- |
| |
| Validation cache objects can: be merged using the command: |
| |
| include::../api/protos/vkMergeValidationCachesEXT.txt[] |
| |
| * pname:device is the logical device that owns the validation cache |
| objects. |
| * pname:dstCache is the handle of the validation cache to merge results |
| into. |
| * pname:srcCacheCount is the length of the pname:pSrcCaches array. |
| * pname:pSrcCaches is an array of validation cache handles, which will be |
| merged into pname:dstCache. |
| The previous contents of pname:dstCache are included after the merge. |
| |
| [NOTE] |
| .Note |
| ==== |
| The details of the merge operation are implementation dependent, but |
| implementations should: merge the contents of the specified validation |
| caches and prune duplicate entries. |
| ==== |
| |
| .Valid Usage |
| **** |
| * [[VUID-vkMergeValidationCachesEXT-dstCache-01536]] |
| pname:dstCache must: not appear in the list of source caches |
| **** |
| |
| include::../validity/protos/vkMergeValidationCachesEXT.txt[] |
| -- |
| |
| [open,refpage='vkGetValidationCacheDataEXT',desc='Get the data store from a validation cache',type='protos'] |
| -- |
| |
| Data can: be retrieved from a validation cache object using the command: |
| |
| include::../api/protos/vkGetValidationCacheDataEXT.txt[] |
| |
| * pname:device is the logical device that owns the validation cache. |
| * pname:validationCache is the validation cache to retrieve data from. |
| * pname:pDataSize is a pointer to a value related to the amount of data in |
| the validation cache, as described below. |
| * pname:pData is either `NULL` or a pointer to a buffer. |
| |
| If pname:pData is `NULL`, then the maximum size of the data that can: be |
| retrieved from the validation cache, in bytes, is returned in |
| pname:pDataSize. |
| Otherwise, pname:pDataSize must: point to a variable set by the user to the |
| size of the buffer, in bytes, pointed to by pname:pData, and on return the |
| variable is overwritten with the amount of data actually written to |
| pname:pData. |
| |
| If pname:pDataSize is less than the maximum size that can: be retrieved by |
| the validation cache, at most pname:pDataSize bytes will be written to |
| pname:pData, and fname:vkGetValidationCacheDataEXT will return |
| ename:VK_INCOMPLETE. |
| Any data written to pname:pData is valid and can: be provided as the |
| pname:pInitialData member of the sname:VkValidationCacheCreateInfoEXT |
| structure passed to fname:vkCreateValidationCacheEXT. |
| |
| Two calls to fname:vkGetValidationCacheDataEXT with the same parameters |
| must: retrieve the same data unless a command that modifies the contents of |
| the cache is called between them. |
| |
| [[validation-cache-header]] |
| Applications can: store the data retrieved from the validation cache, and |
| use these data, possibly in a future run of the application, to populate new |
| validation cache objects. |
| The results of validation, however, may: depend on the vendor ID, device ID, |
| driver version, and other details of the device. |
| To enable applications to detect when previously retrieved data is |
| incompatible with the device, the initial bytes written to pname:pData must: |
| be a header consisting of the following members: |
| |
| .Layout for validation cache header version ename:VK_VALIDATION_CACHE_HEADER_VERSION_ONE_EXT |
| [width="85%",cols="8%,21%,71%",options="header"] |
| |==== |
| | Offset | Size | Meaning |
| | 0 | 4 | length in bytes of the entire validation cache header |
| written as a stream of bytes, with the least |
| significant byte first |
| | 4 | 4 | a elink:VkValidationCacheHeaderVersionEXT value |
| written as a stream of bytes, with the least |
| significant byte first |
| | 8 | ename:VK_UUID_SIZE | a layer commit ID expressed as a UUID, which uniquely |
| identifies the version of the validation layers used |
| to generate these validation results |
| |==== |
| |
| The first four bytes encode the length of the entire validation cache |
| header, in bytes. |
| This value includes all fields in the header including the validation cache |
| version field and the size of the length field. |
| |
| The next four bytes encode the validation cache version, as described for |
| elink:VkValidationCacheHeaderVersionEXT. |
| A consumer of the validation cache should: use the cache version to |
| interpret the remainder of the cache header. |
| |
| If pname:pDataSize is less than what is necessary to store this header, |
| nothing will be written to pname:pData and zero will be written to |
| pname:pDataSize. |
| |
| include::../validity/protos/vkGetValidationCacheDataEXT.txt[] |
| -- |
| |
| [open,refpage='VkValidationCacheHeaderVersionEXT',desc='Encode validation cache version',type='enums',xrefs='vkCreateValdiationCacheEXT vkGetValidationCacheDataEXT'] |
| -- |
| Possible values of the second group of four bytes in the header returned by |
| flink:vkGetValidationCacheDataEXT, encoding the validation cache version, |
| are: |
| |
| include::../api/enums/VkValidationCacheHeaderVersionEXT.txt[] |
| |
| * ename:VK_VALIDATION_CACHE_HEADER_VERSION_ONE_EXT specifies version one |
| of the validation cache. |
| -- |
| |
| [open,refpage='vkDestroyValidationCacheEXT',desc='Destroy a validation cache object',type='protos'] |
| -- |
| |
| To destroy a validation cache, call: |
| |
| include::../api/protos/vkDestroyValidationCacheEXT.txt[] |
| |
| * pname:device is the logical device that destroys the validation cache |
| object. |
| * pname:validationCache is the handle of the validation cache to destroy. |
| * pname:pAllocator controls host memory allocation as described in the |
| <<memory-allocation, Memory Allocation>> chapter. |
| |
| .Valid Usage |
| **** |
| * [[VUID-vkDestroyValidationCacheEXT-validationCache-01537]] |
| If sname:VkAllocationCallbacks were provided when pname:validationCache |
| was created, a compatible set of callbacks must: be provided here |
| * [[VUID-vkDestroyValidationCacheEXT-validationCache-01538]] |
| If no sname:VkAllocationCallbacks were provided when |
| pname:validationCache was created, pname:pAllocator must: be `NULL` |
| **** |
| |
| include::../validity/protos/vkDestroyValidationCacheEXT.txt[] |
| -- |
| endif::VK_EXT_validation_cache[] |