X86 Backend support for vectorized float and byte 16x16 operations

Add support for reserving vector registers for the duration of vector loop.
Add support for 16x16 multiplication, shifts, and add reduce.

Changed the vectorization implementation to be able to use the dataflow
elements for SSA recreation and fixed a few implementation details.

Change-Id: I2f358f05f574fc4ab299d9497517b9906f234b98
Signed-off-by: Jean Christophe Beyler <jean.christophe.beyler@intel.com>
Signed-off-by: Olivier Come <olivier.come@intel.com>
Signed-off-by: Udayan Banerji <udayan.banerji@intel.com>
9 files changed