JNI: Faster mutator locking during transition.

Add mutator lock pointer to `Thread`. This makes retrieving
the pointer faster on ARM and ARM64 and makes it accessible
for JNI stubs if we decide to inline `JniMethodStart()` and
`JniMethodEnd()`.

Pass the lock level `kMutatorLock` explicitly from the
`MutatorMutex` functions to let the compiler evaluate a lot
of the conditions statically and avoid unnecessary code.

Golem results for art-opt-cc (higher is better):
linux-armv7                      before after
NativeDowncallStaticNormal       6.3694 7.2394 (+13.66%)
NativeDowncallStaticNormal6      6.0663 6.8527 (+12.96%)
NativeDowncallStaticNormalRefs6  5.7061 6.3945 (+12.06%)
NativeDowncallVirtualNormal      5.7088 7.2081 (+26.26%)
NativeDowncallVirtualNormal6     5.4563 6.7929 (+24.49%)
NativeDowncallVirtualNormalRefs6 5.1595 6.3415 (+22.91%)
linux-armv8                      before after
NativeDowncallStaticNormal       6.4229 7.0423 (+9.642%)
NativeDowncallStaticNormal6      6.2651 6.8527 (+9.379%)
NativeDowncallStaticNormalRefs6  5.8824 6.3976 (+8.760%)
NativeDowncallVirtualNormal      6.2651 6.8527 (+9.379%)
NativeDowncallVirtualNormal6     6.0663 6.6163 (+9.066%)
NativeDowncallVirtualNormalRefs6 5.6630 6.1408 (+8.436%)
There does not seem to be a measurable difference for x86
and x86-64.

Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Bug: 172332525
Change-Id: I2ad511a2fe7bac250549c43789cf3fb5e2de9e25
7 files changed