Add Math.fma (double & float versions) intrinsics for arm64

Intrinsic implementation is ~500 times faster than java implementation
using BigInteger.

Bug: 199373643
Test: ./art/test/testrunner/testrunner.py --target \
      --optimizing --64 -t 082-inline-execute
Change-Id: I50eae88b332ba9338b0a59fecad7d2158a97ffbb
2 files changed