Add thread-shared interpreter cache

The thread-local interpreter cache handles around 75% of method/field
lookups from the interpreter.

Add thread-shared interpreter cache which can be much bigger (since
we pay the memory just once rather then per thread).  This increases
the cache hit rate to 90%.

This effectively halves the amount of time we spend in
'NterpGetMethod' (including DexCache lookups), which is the single
most expensive method during startup.

Furthermore, it also reduces the amount of time we spend resolving
methods by around 25% since DexCache entries get evicted less often.

The shared cache increases memory use by 256k per process, so also
reduce the fixed-size DexCache fields array, which balances it back.

Test: test.py --host
Change-Id: I3cd369613d47de117ab69d5bee00d4cf89b87913
16 files changed