intel/fs: Copy the PTSS from g0 for scratch reads/writes

In theory, this fixes a bug where we were dropping the PTSS bound on the
floor.  The hardware docs claim that the A32 DWORD and BYTE scattered
read/write messages do a PTSS bounds check.   However, in practice, it
seems that the hardware ignores the bounds check so this doesn't
actually matter.  I verified this with the following couple of piglit
tests:

    https://gitlab.freedesktop.org/mesa/piglit/-/merge_requests/399

In practice, this prevents the next commit from making a subtle
behavioral change.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>
diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index ed76463..0ec2463 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -5431,6 +5431,11 @@
       header = ubld.vgrf(BRW_REGISTER_TYPE_UD);
       ubld.MOV(header, brw_imm_d(0));
       if (is_stateless) {
+         /* Copy the per-thread scratch from g0 for bounds checking */
+         ubld.group(1, 0).AND(component(header, 3),
+                              retype(brw_vec1_grf(0, 3), BRW_REGISTER_TYPE_UD),
+                              brw_imm_ud(0xf));
+
          /* Both the typed and scattered byte/dword A32 messages take a buffer
           * base address in R0.5:[31:0] (See MH1_A32_PSM for typed messages or
           * MH_A32_GO for byte/dword scattered messages in the SKL PRM Vol. 2d