ir3: Vectorize shared memory loads/stores
This drastically helps a Path of Exile 2 compute dispatch, going from
4.6ms to 2.7ms.
Totals from 969 (0.59% of 164134) affected shaders:
MaxWaves: 9586 -> 9560 (-0.27%); split: +0.02%, -0.29%
Instrs: 1252433 -> 1234724 (-1.41%); split: -1.47%, +0.05%
CodeSize: 2237424 -> 2195238 (-1.89%); split: -1.91%, +0.03%
NOPs: 362213 -> 360913 (-0.36%); split: -0.92%, +0.56%
MOVs: 58879 -> 59591 (+1.21%); split: -0.62%, +1.83%
Full: 15817 -> 15867 (+0.32%); split: -0.04%, +0.36%
(ss): 35671 -> 35434 (-0.66%); split: -1.80%, +1.14%
(sy): 23953 -> 23964 (+0.05%); split: -0.38%, +0.43%
(ss)-stall: 127807 -> 124930 (-2.25%); split: -3.43%, +1.18%
(sy)-stall: 583947 -> 585886 (+0.33%); split: -0.61%, +0.94%
Early-preamble: 317 -> 316 (-0.32%)
Cat0: 394577 -> 393316 (-0.32%); split: -0.85%, +0.53%
Cat1: 100335 -> 101057 (+0.72%); split: -0.36%, +1.08%
Cat2: 415880 -> 415835 (-0.01%); split: -0.05%, +0.04%
Cat3: 187928 -> 187929 (+0.00%); split: -0.00%, +0.00%
Cat5: 19143 -> 19148 (+0.03%)
Cat6: 69630 -> 52523 (-24.57%)
Cat7: 47160 -> 47136 (-0.05%); split: -0.56%, +0.51%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34441>
1 file changed