3d66e94f - platform/external/libyuv - Git at Google

commit	3d66e94fb53f0d21db6781e15f4de56ceef0768e	[log] [tgz]
author	George Steed <george.steed@arm.com>	Wed Mar 26 19:44:24 2025 +0000
committer	Frank Barchard <fbarchard@chromium.org>	Thu Jun 12 14:10:44 2025 -0700
tree	d49fe9404cd6ad845914ebbd3cfd4737a93b4b3c
parent	1b2f6cdbe81afd651da102e28ed3a1cf7daf06f9 [diff]

[AArch64] Improve ARGBToUVRow_SVE2 and related kernels

This commit reworks the implementation of ARGBToUVMatrixRow_SVE2, using
an approach similar to that recently used in
61bdaee13a701d2b52c6dc943ccc5c888077a591.

In particular we can rework these SVE2 implementations to use 8-bit
dot-product instructions instead of 16-bit, allowing us to process more
data in a single vector.

To ensure that the input values fit in 8-bits, negate the UV constants
arrays passed to the kernel and undo the now-unnecessary flipping of the
middle two component values.

This commit mostly reverses the performance inversion where the Neon
I8MM implementation was previously faster than the SVE2 implementation.
The reduction in runtime observed compared to the existing Neon I8MM
implementation is now:

Cortex-A510:  +5.6% (!)
Cortex-A520:  -3.0%
Cortex-A710: -12.6%
Cortex-A715: -10.9%
Cortex-A720: -10.8%
  Cortex-X2:  -3.8%
  Cortex-X3: -10.3%
  Cortex-X4:  -9.5%
Cortex-X925:  -6.7%

Change-Id: I30253976dc8e3651cfb5fd39b63a6763975d41e3
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6640990
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>

source/row_sve.cc[diff]

1 file changed

tree: d49fe9404cd6ad845914ebbd3cfd4737a93b4b3c

README.md

libyuv is an open source project that includes YUV scaling and conversion functionality.

Scale YUV to prepare content for compression, with point, bilinear or box filter.
Convert to YUV from webcam formats for compression.
Convert to RGB formats for rendering/effects.
Rotate by 90/180/270 degrees to adjust for mobile devices in portrait mode.
Optimized for SSSE3/AVX2 on x86/x64.
Optimized for Neon/SVE2/SME on Arm.
Optimized for MSA on Mips.
Optimized for RVV on RISC-V.

Development

See Getting started for instructions on how to get started developing.

You can also browse the docs directory for more documentation.