tree 76ebdb27bce5ab57139b3e805f2f9119eda068f2
parent 816b0da3ef7a2fffeda087917353646b3d48fd62
author Usama Arif <usama.arif@linaro.org> 1573815209 +0000
committer Joel Goddard <joel.goddard@linaro.org> 1632147237 +0100

ARM64: FP16 min and max intrinsic for ARMv8

This CL implements intrinsics for min and max method with
ARMv8.2 FP16 instructions.

Also refactors the location builders for FP16 Compare
operations to use new helper FP16ComparisonLocations.

The performance improvements using timeMinFP16 FP16Intrinsic
micro intrinsic benchmark on pixel4:
- Java implementation libcore.util.FP16.min:
    - big cluster only: 935
    - little cluster only: 2373
- arm64 min Intrinisic implementation:
    - big cluster only: 495 (~47% faster)
    - little cluster only: 1521 (~36% faster)

The performance improvements using timeMaxFP16 FP16Intrinsic
micro intrinsic benchmark on pixel4:
- Java implementation libcore.util.FP16.max():
    - big cluster only: 1067
    - little cluster only: 2383
- arm64 max Intrinisic implementation:
    - big cluster only: 496 (~53% faster)
    - little cluster only: 1508 (~37% faster)

Test: 580-checker-fp16
Test: art/test/testrunner/run_build_test_target.py -j80 art-test-javac
Change-Id: I6ecbc96ef7fa7fcb67f5855de3a6f551c247566e
