Optimize Y detiling and p010_to_nv12

Currently 10-bit videos play very slowly because they are being detiled
and then downsampled by the CPU. This CL attempts to optimize both the
detiling process and the downsampling process.

Two basic optimizations for the Y detiling routine:
1. Pull as many calculations as possible out of the innermost loop.
2. Group reads and writes into batches of Y_SUBTILE_WIDTH.

The P010->NV12 downsample is reimplemented using AVX2 so we can
hopefully process 32 pixels at a time.

Bug: 427215721
Test: Manually watch VP9 profile 2 video

Change-Id: I50f130d37e449c30175351d2b05ca69b377fe28a
1 file changed
tree: a9f44f4544d1f0802b5fabb957dcea049e8ac6b9
  1. .cargo/
  2. ci/
  3. examples/
  4. fuzz/
  5. src/
  6. tests/
  7. .clippy.toml
  8. .gitignore
  9. Android.bp
  10. Cargo.toml
  11. cargo_embargo.json
  12. CONTRIBUTING.md
  13. DIR_METADATA
  14. LICENSE
  15. METADATA
  16. MODULE_LICENSE_BSD
  17. OWNERS
  18. README.md
  19. rustfmt.toml
  20. simple_test.py
README.md

Cros-codecs

A lightweight, simple, low-dependency, and hopefully safe crate for hardware-accelerated video decoding and encoding on Linux.

It is developed for use in ChromeOS (particularly crosvm), but has no dependency to ChromeOS and should be usable anywhere.

Current features

  • Simple decoder API,
  • VAAPI decoder support (using cros-libva) for H.264, H.265, VP8, VP9 and AV1,
  • VAAPI encoder support for H.264, VP9 and AV1,
  • Stateful V4L2 encoder support.

Planned features

  • Stateful V4L2 decoder support,
  • Stateless V4L2 decoder support,
  • Support for more encoder codecs,
  • C API to be used in non-Rust projects.

Non-goals

  • Support for systems other than Linux.

Example programs

The ccdec example program can decode an encoded stream and write the decoded frames to a file. As such it can be used for testing purposes.

$ cargo build --examples
$ ./target/debug/examples/ccdec --help
Usage: ccdec <input> [--output <output>] --input-format <input-format> [--output-format <output-format>] [--compute-md5 <compute-md5>]

Simple player using cros-codecs

Positional Arguments:
  input             input file

Options:
  --output          output file to write the decoded frames to
  --input-format    input format to decode from.
  --output-format   pixel format to decode into. Default: i420
  --compute-md5     whether to display the MD5 of the decoded stream, and at
                    which granularity (stream or frame)
  --help            display usage information

Testing

Fluster can be used for testing, using the ccdec example program described above. This branch contains support for cros-codecs testing. Just make sure the ccdec binary is in your PATH, and run Fluster using one of the ccdec decoders, e.g.

python fluster.py run -d ccdec-H.264 -ts JVT-AVC_V1

Credits

The majority of the code in the initial commit has been written by Daniel Almeida as a VAAPI backend for crosvm, before being split into this crate.