gallivm: Optimize single-invocation SSBO stores.

The CTS does a lot of 1x1x1 compute shaders (all that stuff like
dEQP-GLES31.functional.shaders.builtin_functions.precision.mul.highp_compute.scalar)
which finish with store_ssbos.  Instead of doing the invocation loop in
that case (which LLVM has to later unroll), just emit the single
invocation's store.

Fixes timeouts running
dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.36, which does
a spectacular number of SSBO stores in a long 1x1x1 compute shader.
Reduces runtime of on llvmpipe from 66s to 29s locally, and virgl from
1:38 to 43s.  virgl
dEQP-GLES31.functional.ssbo.layout.random.nested_structs_arrays_instance_arrays.22
goes down to 7 seconds.

Fixes: #6797
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17730>
9 files changed