i965/vec4: implement access to DF source components Z/W

The general idea is that with 32-bit swizzles we cannot address DF
components Z/W directly, so instead we select the region that starts
at the the 16B offset into the register and use X/Y swizzles.

The above, however, has the caveat that we can't do that without
violating register region restrictions unless we probably do some
sort of SIMD splitting.

Alternatively, we can accomplish what we need without SIMD splitting
by exploiting the gen7 hardware decompression bug for instructions
with a vstride=0. For example, an instruction like this:

mov(8) r2.x:DF r0.2<0>xyzw:DF

Activates the hardware bug and produces this region:

Component: x0   y0   z0   w0   x1   y1   z1   w1
Register:  r0.2 r0.3 r0.2 r0.3 r1.2 r1.3 r1.2 r1.3

Where r0.2 and r0.3 are r0.z:DF for the first vertex of the SIMD4x2
execution and r1.2 and r1.3 are the same for the second vertex.

Using this to our advantage we can select r0.z:DF by doing
r0.2<0,2,1>.xyxy and r0.w by doing r0.2<0,2,1>.zwzw without needing
to split the instruction.

Of course, this only works for gen7, but that is the only hardware
platform were we implement align16/fp64 at the moment.

v2: Adapted to the fact that we now do this after converting to
    hardware registers (Iago)

Reviewed-by: Matt Turner <mattst88@gmail.com>
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 6d73bb2..cc0a76a 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -2267,7 +2267,28 @@
     */
    assert(brw_is_single_value_swizzle(reg.swizzle));
 
+   /* To gain access to Z/W components we need to select the second half
+    * of the register and then use a X/Y swizzle to select Z/W respectively.
+    */
    unsigned swizzle = BRW_GET_SWZ(reg.swizzle, 0);
+
+   if (swizzle >= 2) {
+      *hw_reg = suboffset(*hw_reg, 2);
+      swizzle -= 2;
+   }
+
+   /* Any 64-bit source with an offset at 16B is intended to address the
+    * second half of a register and needs a vertical stride of 0 so we:
+    *
+    * 1. Don't violate register region restrictions.
+    * 2. Activate the gen7 instruction decompresion bug exploit when
+    *    execsize > 4
+    */
+   if (hw_reg->subnr % REG_SIZE == 16) {
+      assert(devinfo->gen == 7);
+      hw_reg->vstride = BRW_VERTICAL_STRIDE_0;
+   }
+
    hw_reg->swizzle = BRW_SWIZZLE4(swizzle * 2, swizzle * 2 + 1,
                                   swizzle * 2, swizzle * 2 + 1);
 }