Implement ClassStatus::kVisiblyInitialized.

Previously all class initialization checks involved a memory
barrier to ensure appropriate memory visibility. We change
that by introducing the kVisiblyInitialized status which can
be checked without a memory barrier. Before we mark a class
as visibly initialized, we run a checkpoint on all threads
to ensure memory visibility. This is done in batches for up
to 32 classes to reduce the overhead.

Avoiding memory barriers in the compiled code reduces code
size and improves performance. This is also the first step
toward fixing a long-standing synchronization bug 18161648.

Prebuilt sizes for aosp_taimen-userdebug:
 - before:
   arm/boot*.oat: 19150696
   arm64/boot*.oat: 22574336
   oat/arm64/services.odex: 21929800
 - after:
   arm/boot*.oat: 19134508 (-16KiB)
   arm64/boot*.oat: 22553664 (-20KiB)
   oat/arm64/services.odex: 21888760 (-40KiB)

Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: aosp_taimen-userdebug boots
Test: run-gtests.sh -j4
Test: testrunner.py --target --optimizing
Test: Manually diff `m dump-oat-boot` output from before
      with output after this CL without codegen changes,
      with `sed` replacements for class status. Check that
      only checksums and the oatdump runtime values of
      DexCache.dexFile differ.
Bug: 18161648
Bug: 36692143
Change-Id: Ida10439d347e680a0abf4674546923374ffaa957
17 files changed