ARM64: FP16 min and max intrinsic for ARMv8

This CL implements intrinsics for min and max method with
ARMv8.2 FP16 instructions.

Also refactors the location builders for FP16 Compare
operations to use new helper FP16ComparisonLocations.

The performance improvements using timeMinFP16 FP16Intrinsic
micro intrinsic benchmark on pixel4:
- Java implementation libcore.util.FP16.min:
    - big cluster only: 935
    - little cluster only: 2373
- arm64 min Intrinisic implementation:
    - big cluster only: 495 (~47% faster)
    - little cluster only: 1521 (~36% faster)

The performance improvements using timeMaxFP16 FP16Intrinsic
micro intrinsic benchmark on pixel4:
- Java implementation libcore.util.FP16.max():
    - big cluster only: 1067
    - little cluster only: 2383
- arm64 max Intrinisic implementation:
    - big cluster only: 496 (~53% faster)
    - little cluster only: 1508 (~37% faster)

Test: 580-checker-fp16
Test: art/test/testrunner/run_build_test_target.py -j80 art-test-javac
Change-Id: I6ecbc96ef7fa7fcb67f5855de3a6f551c247566e
9 files changed