39e2979b92 - platform/art

commit	39e2979b92c25fc825944bda346216395d326395	[log] [tgz]
author	Usama Arif <usama.arif@linaro.org>	Fri Nov 15 10:53:29 2019 +0000
committer	Joel Goddard <joel.goddard@linaro.org>	Mon Sep 20 15:13:57 2021 +0100
tree	76ebdb27bce5ab57139b3e805f2f9119eda068f2
parent	816b0da3ef7a2fffeda087917353646b3d48fd62 [diff]

ARM64: FP16 min and max intrinsic for ARMv8

This CL implements intrinsics for min and max method with
ARMv8.2 FP16 instructions.

Also refactors the location builders for FP16 Compare
operations to use new helper FP16ComparisonLocations.

The performance improvements using timeMinFP16 FP16Intrinsic
micro intrinsic benchmark on pixel4:
- Java implementation libcore.util.FP16.min:
    - big cluster only: 935
    - little cluster only: 2373
- arm64 min Intrinisic implementation:
    - big cluster only: 495 (~47% faster)
    - little cluster only: 1521 (~36% faster)

The performance improvements using timeMaxFP16 FP16Intrinsic
micro intrinsic benchmark on pixel4:
- Java implementation libcore.util.FP16.max():
    - big cluster only: 1067
    - little cluster only: 2383
- arm64 max Intrinisic implementation:
    - big cluster only: 496 (~53% faster)
    - little cluster only: 1508 (~37% faster)

Test: 580-checker-fp16
Test: art/test/testrunner/run_build_test_target.py -j80 art-test-javac
Change-Id: I6ecbc96ef7fa7fcb67f5855de3a6f551c247566e

9 files changed