arm64/nterp: Refactor n2n entrypoint check and call.
Move the check from the `common_invoke_*` functions to the
callers. Except for the string-init, replace the jump to the
"done" label with the fetch-and-dispatch immediatelly after
the `nterp_to_nterp_{static,inctance}_{non_,}range` call.
Move the invoke-static cache-miss slow path out of the
opcode handler to make space for other code. Move the
nterp-to-nterp entrypoint check and call to that space.
Move the string-init argument adjustments to the helpers.
Local benchmarking on Pixel 3 shows minor changes for
nterp-to-non-nterp on big cores (from 2.5% improvement to
1.5% regression) and little cores (-0.5% to +0.5%, except
for two odd outliers - 5.5% regression and 3% improvement),
and for nterp-to-nterp on big cores (-1.5% to 1.5%). For
nterp-to-nterp on little cores, there is a consistent ~0.5%
improvement for static invokes while instance calls are
essentially unaffected.
Test: testrunner.py --target --64 --interpreter
Flag: EXEMPT PURE_REFACTOR
Change-Id: Ic7e73a335a48740a0ce21ed810aaa227a51454f1
2 files changed