Optimizing String.Equals as an intrinsic (ARM)

The second implementation of String.Equals. I added an intrinsic
in ARM which is similar to the original java implementation
of String.equals: an instanceof check, null check, length check, and
reference equality check followed by a loop comparing strings two
characters at a time. After extensive benchmarking, it seems that
comparing strings forward is faster and is worth the additional temp
register required. Additionally, moving the add and sub instructions
that are anyways necessary between branches in the comparison loop
improved benchmarking results.

Interesting benchmarking values:

Optimizing Compiler on Nexus 7
    Intrinsic 1-5 Character Strings = 91.6 ns
    Original 1-5 Character Strings = 285.84 ns
    Intrinsic 15-30 Character Strings = 176 ns
    Original 15-30 Character Strings = 367.67 ns
    Intrinsic 100-1000 Character Strings = 2992.1 ns
    Original 100-1000 Character Strings = 9098.5 ns
    Intrinsic Null Argument = 70.9 ns
    Original Null Argument = 189 ns

Code Expansion:

    Average size of 116 apps without intrinsic = 6880521.11 bytes
    Average size of 116 apps with intrinsic = 6901107.97 bytes
    Overall 0.299% increase in code size

Bug: 21481923
Change-Id: I48df2a74f2a92b56fb9479fbf14276d44e880aed
1 file changed