src/intel/genxml/README - platform/external/mesa3d - Git at Google

 This provides some background the design of the generated headers.  We
 started out trying to generate bit fields but it evolved into the pack
 functions because of a few limitations:

   1) Bit fields still generate terrible code today. Even with modern
      optimizing compilers you get multiple load+mask+store operations
      to the same dword in memory as you set individual bits. The
      compiler also has to generate code to mask out overflowing values
      (for example, if you assign 200 to a 2 bit field). Our driver
      never writes overflowing values so that's not needed. On the
      other hand, most compiler recognize that the template struct we
      use is a temporary variable and copy propagate the individual
      fields and do amazing constant folding.  You should take a look
      at the code that gets generated when you compile in release mode
      with optimizations.

   2) For some types we need to have overlapping bit fields. For
      example, some values are 64 byte aligned 32 bit offsets. The
      lower 5 bits of the offset are always zero, so the hw packs in a
      few misc bits in the lower 5 bits there. Other times a field can
      be either a u32 or a float. I tried to do this with overlapping
      anonymous unions and it became a big mess. Also, when using
      initializers, you can only initialize one union member so this
      just doesn't work with out approach.

      The pack functions on the other hand allows us a great deal of
      flexibility in how we combine things. In the case of overlapping
      fields (the u32 and float case), if we only set one of them in
      the pack function, the compiler will recognize that the other is
      initialized to 0 and optimize out the code to or it it.

   3) Bit fields (and certainly overlapping anonymous unions of bit
      fields) aren't generally stable across compilers in how they're
      laid out and aligned. Our pack functions let us control exactly
      how things get packed, using only simple and unambiguous bitwise
      shifting and or'ing that works on any compiler.

 Once we have the pack function it allows us to hook in various
 transformations and validation as we go from template struct to dwords
 in memory:

   1) Validation: As I said above, our driver isn't supposed to write
      overflowing values to the fields, but we've of course had lots of
      cases where we make mistakes and write overflowing values. With
      the pack function, we can actually assert on that and catch it at
      runtime.  bitfields would just silently truncate.

   2) Type conversions: some times it's just a matter of writing a
      float to a u32, but we also convert from bool to bits, from
      floats to fixed point integers.

   3) Relocations: whenever we have a pointer from one buffer to
      another (for example a pointer from the meta data for a texture
      to the raw texture data), we have to tell the kernel about it so
      it can adjust the pointer to point to the final location. That
      means extra work we have to do extra work to record and annotate
      the dword location that holds the pointer. With bit fields, we'd
      have to call a function to do this, but with the pack function we
      generate code in the pack function to do this for us. That's a
      lot less error prone and less work.
	This provides some background the design of the generated headers. We
	started out trying to generate bit fields but it evolved into the pack
	functions because of a few limitations:

	1) Bit fields still generate terrible code today. Even with modern
	optimizing compilers you get multiple load+mask+store operations
	to the same dword in memory as you set individual bits. The
	compiler also has to generate code to mask out overflowing values
	(for example, if you assign 200 to a 2 bit field). Our driver
	never writes overflowing values so that's not needed. On the
	other hand, most compiler recognize that the template struct we
	use is a temporary variable and copy propagate the individual
	fields and do amazing constant folding. You should take a look
	at the code that gets generated when you compile in release mode
	with optimizations.

	2) For some types we need to have overlapping bit fields. For
	example, some values are 64 byte aligned 32 bit offsets. The
	lower 5 bits of the offset are always zero, so the hw packs in a
	few misc bits in the lower 5 bits there. Other times a field can
	be either a u32 or a float. I tried to do this with overlapping
	anonymous unions and it became a big mess. Also, when using
	initializers, you can only initialize one union member so this
	just doesn't work with out approach.

	The pack functions on the other hand allows us a great deal of
	flexibility in how we combine things. In the case of overlapping
	fields (the u32 and float case), if we only set one of them in
	the pack function, the compiler will recognize that the other is
	initialized to 0 and optimize out the code to or it it.

	3) Bit fields (and certainly overlapping anonymous unions of bit
	fields) aren't generally stable across compilers in how they're
	laid out and aligned. Our pack functions let us control exactly
	how things get packed, using only simple and unambiguous bitwise
	shifting and or'ing that works on any compiler.

	Once we have the pack function it allows us to hook in various
	transformations and validation as we go from template struct to dwords
	in memory:

	1) Validation: As I said above, our driver isn't supposed to write
	overflowing values to the fields, but we've of course had lots of
	cases where we make mistakes and write overflowing values. With
	the pack function, we can actually assert on that and catch it at
	runtime. bitfields would just silently truncate.

	2) Type conversions: some times it's just a matter of writing a
	float to a u32, but we also convert from bool to bits, from
	floats to fixed point integers.

	3) Relocations: whenever we have a pointer from one buffer to
	another (for example a pointer from the meta data for a texture
	to the raw texture data), we have to tell the kernel about it so
	it can adjust the pointer to point to the final location. That
	means extra work we have to do extra work to record and annotate
	the dword location that holds the pointer. With bit fields, we'd
	have to call a function to do this, but with the pack function we
	generate code in the pack function to do this for us. That's a
	lot less error prone and less work.