radv/nir/lower_cmat: use explicit shift when calculating gfx12 wave64 layout
The rest of the compiler stack doesn't understand the alignment implications
of the combined shift.
Effect on llama.cpp fossils:
Totals from 3 (13.64% of 22) affected shaders:
Instrs: 5778 -> 5684 (-1.63%)
CodeSize: 33540 -> 32800 (-2.21%)
VGPRs: 228 -> 216 (-5.26%)
Latency: 39942 -> 39417 (-1.31%)
InvThroughput: 12037 -> 11862 (-1.45%)
VALU: 2162 -> 2111 (-2.36%)
More importantly, this replaces some ds_load_2addr_b32 with ds_load_b64.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13447
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36016>
1 file changed