Assembly TLAB allocation fast path for arm64.

This is the arm64 version of CL 187537.

Speedup (GSS GC with TLAB on N9):
        BinaryTrees:   591 ->  493 ms (-17%)
        MemAllocTest:  792 ->  755 ms (-5%)

Bug: 9986565

Change-Id: Icdad28cab0fd835679c640b7eae59b33ac2d6654
3 files changed