[AArch64] Use Neon dot-product instructions in ARGBToYMatrixRow

Using the dot-product instructions here allows us to avoid needing LD4
for loading individual colour channels, which gives a big benefit on
some micro-architectures where such instructions perform significantly
worse than LD1. In addition the dot-product instructions have higher
throughput compared to the Neon

Observed reduction in runtimes for selected kernels moving from *_NEON
to *_NEON_DotProd:

     Kernel | Cortex-A55 | Cortex-A510 | Cortex-A76 | Cortex-X2
ABGRToYJRow |      -6.5% |      -22.5% |     -43.5% |    -71.2%
 ABGRToYRow |      -6.5% |      -22.5% |     -43.5% |    -68.3%
ARGBToYJRow |      -6.5% |      -22.5% |     -43.5% |    -68.1%
 ARGBToYRow |      -6.5% |      -22.5% |     -43.5% |    -68.1%
 BGRAToYRow |      -6.5% |      -22.5% |     -42.3% |    -68.4%
RGBAToYJRow |      -6.5% |      -22.5% |     -42.2% |    -73.7%
 RGBAToYRow |      -6.5% |      -22.5% |     -42.3% |    -64.9%

Bug: libyuv:977
Change-Id: If244190a7bdacf7e6e6b16af7e6853ee13ff6585
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5424737
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
5 files changed
tree: 84a1ac2eb6fd0e00fc7b105740d6351e32e33786
  1. build_overrides/
  2. docs/
  3. include/
  4. infra/
  5. riscv_script/
  6. source/
  7. tools_libyuv/
  8. unit_test/
  9. util/
  10. .clang-format
  11. .gitignore
  12. .gn
  13. .vpython
  14. .vpython3
  15. Android.bp
  16. Android.mk
  17. AUTHORS
  18. BUILD.gn
  19. CM_linux_packages.cmake
  20. CMakeLists.txt
  21. codereview.settings
  22. DEPS
  23. DIR_METADATA
  24. download_vs_toolchain.py
  25. libyuv.gni
  26. libyuv.gyp
  27. libyuv.gypi
  28. LICENSE
  29. linux.mk
  30. OWNERS
  31. PATENTS
  32. PRESUBMIT.py
  33. public.mk
  34. pylintrc
  35. README.chromium
  36. README.md
  37. winarm.mk
README.md

libyuv is an open source project that includes YUV scaling and conversion functionality.

  • Scale YUV to prepare content for compression, with point, bilinear or box filter.
  • Convert to YUV from webcam formats for compression.
  • Convert to RGB formats for rendering/effects.
  • Rotate by 90/180/270 degrees to adjust for mobile devices in portrait mode.
  • Optimized for SSSE3/AVX2 on x86/x64.
  • Optimized for Neon on Arm.
  • Optimized for MSA on Mips.
  • Optimized for RVV on RISC-V.

Development

See Getting started for instructions on how to get started developing.

You can also browse the docs directory for more documentation.