Improve libcore_util_CharsetUtils performance.

Use ScopedFastNativeObjectAccess for @FastNative methods.
Avoid expensive JNI calls by using mirror::ByteArray.
For compressed strings, just copy the ASCII data.
For uncompressed strings, pre-calculate the UTF-8 length
to avoid unnecessary reallocations. Also access 16-bit
characters directly instead of using String::CharAt() to
avoid unnecessary string compression checks that clang++
is unable to optimize away.

The results for StringToBytesBenchmark on blueline little
cores running at fixed frequency 1420800 are approximately
(medians from 3 runs)        before   after
timeGetBytesAscii EMPTY      1599.86   519.36
timeGetBytesAscii L_16       1849.31   535.59
timeGetBytesAscii L_64       2582.72   646.07
timeGetBytesAscii L_256      5566.70  1132.11
timeGetBytesAscii L_512      9585.88  1649.34
timeGetBytesAscii A_16       1840.06   540.05
timeGetBytesAscii A_64       2550.41   614.85
timeGetBytesAscii A_256      5382.15   919.59
timeGetBytesAscii A_512      9181.93  1226.82
timeGetBytesIso88591 EMPTY   1589.57   515.62
timeGetBytesIso88591 L_16    1835.09   535.58
timeGetBytesIso88591 L_64    2588.90   650.84
timeGetBytesIso88591 L_256   5585.69  1118.37
timeGetBytesIso88591 L_512   9635.12  1625.92
timeGetBytesIso88591 A_16    1827.21   529.83
timeGetBytesIso88591 A_64    2548.83   603.32
timeGetBytesIso88591 A_256   5356.75   916.76
timeGetBytesIso88591 A_512   9172.74  1224.04
timeGetBytesUtf8 EMPTY       1599.00   510.61
timeGetBytesUtf8 L_16        1876.05   632.55
timeGetBytesUtf8 L_64        2781.85  1054.06
timeGetBytesUtf8 L_256      12136.15  3708.94
timeGetBytesUtf8 L_512      21357.30  7811.28
timeGetBytesUtf8 A_16        1888.64   531.15
timeGetBytesUtf8 A_64        2785.70   598.75
timeGetBytesUtf8 A_256       6300.25   906.34
timeGetBytesUtf8 A_512      11074.56  1231.62

Test: run-libcore-tests.sh --mode=host
Bug: 170281727
Change-Id: I03d2420b2e1eefc1fa5232deddba593aebd51941
1 file changed