Relax some CASes for the CC collector.

That is, removing some unnecessary memory fences.

We can use the relaxed CAS for the mark bitmap and reference field/GC
root updates because only the atomicity of the updated word matters
there.

We can use the release CAS for the read barrier bits in the lock word
because it needs to make sure the reference field updates are visible
when the object changes black from gray (the field update stores won't
be reordered after the CAS.)

The CC collector's Ritz EAAC GC time decreases from 34.7s to
29.1s (-16%) on N5.

Bug: 12687968

Change-Id: If082d5911a25fac695df66263a8f55ce8149b199
5 files changed