Make the following primops take a third (initial) argument to
indicate the rounding mode to use, like their scalar cousins do:

  Iop_Add32Fx4  Iop_Sub32Fx4  Iop_Mul32Fx4  Iop_Div32Fx4  
  Iop_Add64Fx2  Iop_Sub64Fx2  Iop_Mul64Fx2  Iop_Div64Fx2  
  Iop_Add64Fx4  Iop_Sub64Fx4  Iop_Mul64Fx4  Iop_Div64Fx4
  Iop_Add32Fx8  Iop_Sub32Fx8  Iop_Mul32Fx8  Iop_Div32Fx8

Fix up the x86 and amd64 front ends to add fake rounding modes
(Irrm_NEAREST) when generating expressions using these primops.
Fix up the x86 and amd64 back ends to accept these as triops
rather than as binops, and ignore the first arg.

Add three more ir_opt folding rules to remove memcheck
instrumentation arising from instrumentation of known-defined
rounding modes.

Overall functional and performance effects should be zero.



git-svn-id: svn://svn.valgrind.org/vex/trunk@2809 8f6e269a-dfd6-0310-a8e1-e2731360e62c
7 files changed