aarch64: Combine memcpy and memmove implementations

Modify integer and SIMD versions of memcpy to handle overlaps correctly.

Make __memmove_aarch64 and __memmove_aarch64_simd alias to
__memcpy_aarch64 and __memcpy_aarch64_simd respectively.

Complete sharing of code between memcpy and memmove implementations is
possible without noticeable performance penalty. This is thanks to
moving the source and destination buffer overlap detection after
the code for handling small and medium copies which are overlap-safe
anyway.

Benchmarking shows that keeping two versions of memcpy is necessary
because newer platforms favor aligning src over destination for large
copies. Using NEON registers also gives a small speedup. However,
aligning dst and using general-purpose registers works best for older
platforms. Consequently, memcpy.S and memcpy_simd.S contain memcpy
code which is identical except for the registers used and src vs dst
alignment.
7 files changed
tree: c350db0a1be8f6fababea05a8abaa4a896d59bcc
  1. math/
  2. string/
  3. .gitignore
  4. config.mk.dist
  5. LICENSE
  6. Makefile
  7. README