ARM64: Ensure stricter alignment when loading and storing register pairs

The impetus for this change is the fact that loads that cross a 64 byte
boundary and stores that cross a 16 byte boundary are a performance issue
on Cortex-A57 and A72.

Change-Id: I81263dc72272192ad2d190b741a955f175880461
2 files changed