ART: Implement loop full unrolling.

Performs whole loop unrolling for small loops with small
trip count to eliminate the loop check overhead, to have
more opportunities for inter-iteration optimizations.

caffeinemark/FloatAtom: 1.2x performance on arm64 Cortex-A57.

Test: 530-checker-peel-unroll.
Test: test-art-host, test-art-target.
Change-Id: Idf3fe3cb611376935d176c60db8c49907222e28a
6 files changed