Update to latest cortexa15 memcpy code.

This uses the new code original submitted as memcpy.a15.S as
the base. However, the old code handled unaligned src/dst better
so that was spliced in. I optimized the original unaligned code by
removing a few unnecessary instructions. I optimized the a15 code by
rewriting the pre and post code. I also modified the main loop to add
a pld so that larger copies would not stall waiting for memory.

Test cases for the new memcpy:

- Copy all sized values from 0 to 1024 bytes, using whatever alignment
  is returned by malloc.
For each alignment case described below, the test copied from 0 to 128
bytes.
- Src and dst pointers are both aligned to the same value, starting
  at one going through every power of two up to and including 128.
- Src aligned to double word boundary, dst aligned to word boundary.
- Src aligned to word boundary, dst aligned to double word boundary.
- Src aligned to 16 bit boundary, dst aligned to word boundary.
- Src aligned to word boundary, dst aligned to 16 byte boundary.
- Src aligned to word boundary, dst aligned to 1 byte from a word
  boundary.
- Src aligned to word boundary, dst aligned to 2 bytes from a word
  boundary.
- Src aligned to word boundary, dst aligned to 3 bytes from a word
  boundary.
- Src aligned to 1 byte from a word boundary, dst aligned to a word
  boundary.
- Src aligned to 2 bytes from a word boundary, dst aligned to a word
  boundary.
- Src aligned to 3 bytes from a word boundary, dst aligned to a word
  boundary.

Cases to verify the unaligned source code properly aligns to a 16 bit
boundary.
- Src aligned to 1 byte from a 128 bit boundary, dst aligned to
  4 + 128 bit boundary.
- Src aligned to 1 byte from a 128 bit boundary, dst aligned to
  8 + 128 bit boundary.
- Src aligned to 1 byte from a 128 bit boundary, dst aligned to
  12 + 128 bit boundary.
- Src aligned to 1 byte from a 128 bit boundary, dst aligned to
  16 + 128 bit boundary.

In all cases, a two byte fencepost was placed at the end of the
destination to verify that only the requested number of bytes were copied.

Bug: 8005082
Change-Id: I700b2fab81941959d301ab1934c18fbd8ee3eee4
1 file changed