Opt compiler: ARM64: Use ldp/stp on arm64 for slow paths.

It should be a bit faster than load/store single registers and reduce
the code size.

Change-Id: I67b8302adf6174b7bb728f7c2afd2c237e34ffde
4 files changed