implement VE4/HE4/RD4/... in SSE2

(30% faster prediction functions, but overall speed-up is ~1% only)

Change-Id: I2c6e7074aa26a2359c9198a9015e5cbe143c2765
1 file changed