commit | b25cca8c2edba5fbc18448007da2624a25113f4d | [log] [tgz] |
---|---|---|
author | Jonathan Wright <jonathan.wright@arm.com> | Sat Feb 25 00:43:46 2023 +0000 |
committer | Jonathan Wright <jonathan.wright@arm.com> | Mon Feb 27 09:49:02 2023 +0000 |
tree | 6b4a8cb4adbcdabb409c13954db1c6e18b869e06 | |
parent | 45dc0d34d2fa1a848c282d8fc992206fa69f01b8 [diff] |
Optimize transpose_neon.h helper functions 1) Use vtrn[12]q_[su]64 in vpx_vtrnq_[su]64* helpers on AArch64 targets. This produces half as many TRN1/2 instructions compared to the number of MOVs that result from vcombine. 2) Use vpx_vtrnq_[su]64* helpers wherever applicable. 3) Refactor transpose_4x8_s16 to operate on 128-bit vectors. Change-Id: I9a8b1c1fe2a98a429e0c5f39def5eb2f65759127