5b349fc - platform/bionic

commit	5b349fc22e7ba35ecb76b365d8be71939d204cde	[log] [tgz]
author	Greta Yorsh <greta.yorsh@arm.com>	Tue Oct 04 16:02:25 2011 +0000
committer	David Butcher <david.butcher@arm.com>	Fri Mar 01 10:40:50 2013 +0000
tree	e47414c63add667725f134e35e9df2c6f1b31ae0
parent	7c0dd555c09c880b71c7c4039993d1d029add109 [diff]

Adding memcpy tuned for Cortex-A15.

The strategy for large block sizes is LDRD and STRD with offset addressing,
where the main loop copies 64 bytes in every iteration, (i.e., 8 calls to
LDRD and STRD pairs), interleaving load and stores (i.e., the pairs of LDRD
and STRD of the same data are consecutive instructions), and the writeback
of an updated address is a separate instruction, which allows us to write
back the accumulated update once per iteration.

This strategy is implemented in memcpy.S. In some configurations, a plain
version of memcpy (included from memcpy-stub.c) is used instead of the
optimized one.

Validation:
* Correctness: checked memcpy using a test harness for block sizes
ranging between 1 to 128, and source and destination buffers alignment
ranging in { 0,1,2,3,4,8,12 } bytes each.
* Performance: benchmarking on Cortex-A15 FPGA indicates that this strategy
is better for A15 than the strategy used by glibc and even slightly better
than using NEON. Benchmarking on Cortex-A9 bare metal and Linux shows
that the proposed strategy is reasonable: not as fast as the version of
memcpy from glibc (which is the best open source strategy for A9), but
comparable with csl and bionic.
* Integration with GCC: no regression for arm-none-eabi --with-cpu
cortex-a15 and cortex-a9.

Change-Id: Ied56354d8992c62ae3e02d582a2bd55585d814b9
Signed-off-by: Vassilis Laganakos <vasileios.laganakos@arm.com>

libc/arch-arm/bionic/memcpy.a15.S[Added - diff]

1 file changed

tree: e47414c63add667725f134e35e9df2c6f1b31ae0