[X86][SLP] Enable SLP vectorization for 128-bit horizontal X86 instructions (add, sub)

Try to use 64-bit SLP vectorization. In addition to horizontal instrs
this change triggers optimizations for partial vector operations (for instance,
using low halfs of 128-bit registers xmm0 and xmm1 to multiply <2 x float> by
<2 x float>).

Fixes llvm.org/PR32433

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@353923 91177308-0d34-0410-b5e6-96231b3b80d8
29 files changed