Optimizing/ARM: Improve long shifts by 1.

Implement long
    Shl(x,1) as LSLS+ADC,
    Shr(x,1) as ASR+RRX and
    UShr(x,1) as LSR+RRX.

Remove the simplification substituting Shl(x,1) with
ADD(x,x) as it interferes with some other optimizations
instead of helping them. And since it didn't help 64-bit
architectures anyway, codegen is the correct place for it.
This is now implemented for ARM and x86, so only mips32 can
be improved.

Change-Id: Idd14f23292198b2260189e1497ca5411b21743b3
7 files changed