radeonsi: prevent a gfx10_ngg_calculate_subgroup_info failure for TES+NGG GS
arb_tessellation_shader-tes-gs-max-output -small -scan 1 50 -auto -fbo
doesn't pass, but at least all shaders are compiled successfully.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5700>
diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c b/src/gallium/drivers/radeonsi/si_state_shaders.c
index d1a1aa7..01ed224 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -2651,9 +2651,15 @@
sel->gs_input_verts_per_prim =
u_vertices_per_prim(sel->info.properties[TGSI_PROPERTY_GS_INPUT_PRIM]);
- /* EN_MAX_VERT_OUT_PER_GS_INSTANCE does not work with tesselation. */
+ /* EN_MAX_VERT_OUT_PER_GS_INSTANCE does not work with tesselation so
+ * we can't split workgroups. Disable ngg if any of the following conditions is true:
+ * - num_invocations * gs_max_out_vertices > 256
+ * - LDS usage is too high
+ */
sel->tess_turns_off_ngg = sscreen->info.chip_class >= GFX10 &&
- sel->gs_num_invocations * sel->gs_max_out_vertices > 256;
+ (sel->gs_num_invocations * sel->gs_max_out_vertices > 256 ||
+ sel->gs_num_invocations * sel->gs_max_out_vertices *
+ (sel->info.num_outputs * 4 + 1) > 6500 /* max dw per GS primitive */);
break;
case PIPE_SHADER_TESS_CTRL: