| commit | 3d66e94fb53f0d21db6781e15f4de56ceef0768e | [log] [tgz] |
|---|---|---|
| author | George Steed <george.steed@arm.com> | Wed Mar 26 19:44:24 2025 +0000 |
| committer | Frank Barchard <fbarchard@chromium.org> | Thu Jun 12 14:10:44 2025 -0700 |
| tree | d49fe9404cd6ad845914ebbd3cfd4737a93b4b3c | |
| parent | 1b2f6cdbe81afd651da102e28ed3a1cf7daf06f9 [diff] |
[AArch64] Improve ARGBToUVRow_SVE2 and related kernels This commit reworks the implementation of ARGBToUVMatrixRow_SVE2, using an approach similar to that recently used in 61bdaee13a701d2b52c6dc943ccc5c888077a591. In particular we can rework these SVE2 implementations to use 8-bit dot-product instructions instead of 16-bit, allowing us to process more data in a single vector. To ensure that the input values fit in 8-bits, negate the UV constants arrays passed to the kernel and undo the now-unnecessary flipping of the middle two component values. This commit mostly reverses the performance inversion where the Neon I8MM implementation was previously faster than the SVE2 implementation. The reduction in runtime observed compared to the existing Neon I8MM implementation is now: Cortex-A510: +5.6% (!) Cortex-A520: -3.0% Cortex-A710: -12.6% Cortex-A715: -10.9% Cortex-A720: -10.8% Cortex-X2: -3.8% Cortex-X3: -10.3% Cortex-X4: -9.5% Cortex-X925: -6.7% Change-Id: I30253976dc8e3651cfb5fd39b63a6763975d41e3 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6640990 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Justin Green <greenjustin@google.com>
libyuv is an open source project that includes YUV scaling and conversion functionality.
See Getting started for instructions on how to get started developing.
You can also browse the docs directory for more documentation.