| commit | f2e78e130434834e3815e4bdd2900d940ded9af3 | [log] [tgz] |
|---|---|---|
| author | George Steed <george.steed@arm.com> | Fri Mar 15 18:22:03 2024 +0000 |
| committer | Frank Barchard <fbarchard@chromium.org> | Tue Apr 09 03:09:36 2024 +0000 |
| tree | 84a1ac2eb6fd0e00fc7b105740d6351e32e33786 | |
| parent | 64061790630b8fab97bbc8ada7f558c665e13674 [diff] |
[AArch64] Use Neon dot-product instructions in ARGBToYMatrixRow
Using the dot-product instructions here allows us to avoid needing LD4
for loading individual colour channels, which gives a big benefit on
some micro-architectures where such instructions perform significantly
worse than LD1. In addition the dot-product instructions have higher
throughput compared to the Neon
Observed reduction in runtimes for selected kernels moving from *_NEON
to *_NEON_DotProd:
Kernel | Cortex-A55 | Cortex-A510 | Cortex-A76 | Cortex-X2
ABGRToYJRow | -6.5% | -22.5% | -43.5% | -71.2%
ABGRToYRow | -6.5% | -22.5% | -43.5% | -68.3%
ARGBToYJRow | -6.5% | -22.5% | -43.5% | -68.1%
ARGBToYRow | -6.5% | -22.5% | -43.5% | -68.1%
BGRAToYRow | -6.5% | -22.5% | -42.3% | -68.4%
RGBAToYJRow | -6.5% | -22.5% | -42.2% | -73.7%
RGBAToYRow | -6.5% | -22.5% | -42.3% | -64.9%
Bug: libyuv:977
Change-Id: If244190a7bdacf7e6e6b16af7e6853ee13ff6585
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5424737
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
libyuv is an open source project that includes YUV scaling and conversion functionality.
See Getting started for instructions on how to get started developing.
You can also browse the docs directory for more documentation.