Update to latest cortexa15 memcpy code.

This uses the new code original submitted as memcpy.a15.S as
the base. However, the old code handled unaligned src/dst better
so that was spliced in. I optimized the original unaligned code by
removing a few unnecessary instructions. I optimized the a15 code by
rewriting the pre and post code. I also modified the main loop to add
a pld so that larger copies would not stall waiting for memory.

Test cases for the new memcpy:

- Copy all sized values from 0 to 1024 bytes, using whatever alignment
  is returned by malloc.
For each alignment case described below, the test copied from 0 to 128
bytes.
- Src and dst pointers are both aligned to the same value, starting
  at one going through every power of two up to and including 128.
- Src aligned to double word boundary, dst aligned to word boundary.
- Src aligned to word boundary, dst aligned to double word boundary.
- Src aligned to 16 bit boundary, dst aligned to word boundary.
- Src aligned to word boundary, dst aligned to 16 byte boundary.
- Src aligned to word boundary, dst aligned to 1 byte from a word
  boundary.
- Src aligned to word boundary, dst aligned to 2 bytes from a word
  boundary.
- Src aligned to word boundary, dst aligned to 3 bytes from a word
  boundary.
- Src aligned to 1 byte from a word boundary, dst aligned to a word
  boundary.
- Src aligned to 2 bytes from a word boundary, dst aligned to a word
  boundary.
- Src aligned to 3 bytes from a word boundary, dst aligned to a word
  boundary.

Cases to verify the unaligned source code properly aligns to a 16 bit
boundary.
- Src aligned to 1 byte from a 128 bit boundary, dst aligned to
  4 + 128 bit boundary.
- Src aligned to 1 byte from a 128 bit boundary, dst aligned to
  8 + 128 bit boundary.
- Src aligned to 1 byte from a 128 bit boundary, dst aligned to
  12 + 128 bit boundary.
- Src aligned to 1 byte from a 128 bit boundary, dst aligned to
  16 + 128 bit boundary.

In all cases, a two byte fencepost was placed at the end of the
destination to verify that only the requested number of bytes were copied.

Bug: 8005082

Merge from internal master.

(cherry-picked from commit 21ede92d794969f22cacbdb9f557818f1c5712b5)

Change-Id: Ief70c9e6dc8c6473ae245b6570b2c266fed9618c
1 file changed