ART: Improve JitProfile perf in arm/arm64 mterp

ART currently requires two profiling-related things from the
interpreters: hotness updates and OSR switch checks.  The hotness
updates previously used the existing instrumentation framework - which
is flexible, but quite heavyweight.  For most things, the
instrumentation framework overhead is acceptable, but because we do a
hotness update on every backwards branch the overhead is unacceptable.
Prior to this CL, branch profiling dominates interpreter cost.

Here, we bypass the instrumentation framework for hotness updates
and deliver a significant performance improvement.  Running
interpreter-only (dalvikvm -Xint) on a Nexus 6, we see the logic
subtest of Caffeinemark improving from 2600 to 9200, and the
overall score going from 1979 to over 3000.  Compared to the
C++ switch interpreter, we see a 6x improvement on the branchy logic
subtest and a 2.6x improvement overall.

Compared with the previous mterp which did not have support for
jit profiling, we see a few (1% to 5%) performance loss on the
standard command-line benchmarks.  I consider this acceptable
(we could create an alternate non-profiling mterp which would
have no penalty, but I don't consider this overhead big enough to
justify that).

Change-Id: I50b5b8c5ed8ebda3c8b4e65d27ba7393c3feae04
44 files changed