ART: Improve JitProfile perf in arm/arm64 mterp
ART currently requires two profiling-related things from the
interpreters: hotness updates and OSR switch checks. The hotness
updates previously used the existing instrumentation framework - which
is flexible, but quite heavyweight. For most things, the
instrumentation framework overhead is acceptable, but because we do a
hotness update on every backwards branch the overhead is unacceptable.
Prior to this CL, branch profiling dominates interpreter cost.
Here, we bypass the instrumentation framework for hotness updates
and deliver a significant performance improvement. Running
interpreter-only (dalvikvm -Xint) on a Nexus 6, we see the logic
subtest of Caffeinemark improving from 2600 to 9200, and the
overall score going from 1979 to over 3000. Compared to the
C++ switch interpreter, we see a 6x improvement on the branchy logic
subtest and a 2.6x improvement overall.
Compared with the previous mterp which did not have support for
jit profiling, we see a few (1% to 5%) performance loss on the
standard command-line benchmarks. I consider this acceptable
(we could create an alternate non-profiling mterp which would
have no penalty, but I don't consider this overhead big enough to
44 files changed