sync: Align structs to cache lines

Updating an atomic value invalidates the entire cache line to which it
belongs, which can make the next access to that cache line slower on
other CPU cores.  This can lead to "destructive interference" or "false
sharing", where atomic operations on two or more unrelated values on the
same cache line cause hardware interference with each other, reducing
overall performance.

Deal with this by aligning atomic primitives to the cache line width so
that two primitives are not placed on the same cache line.  This also
has the benefit of causing *constructive* interference between the
atomic value and the data it protects.  Since the user of the atomic
primitive likely wants to access the protected data after acquiring
access, having them both on the same cache line makes the subsequent
access to the data faster.

A common pattern for synchronization primitives is to put them inside an
Arc. However, if the primitive did not specify cache line alignment then
both the atomic reference count and the atomic state could end up on the
same cache line. In this case, changing the reference count of the
primitive would cause destructive interference with its operation.  With
the proper alignment, both the atomic state and the reference count end
up on different cache lines so there would be no interference between
them.

Since we can't query the cache line width of the target machine at build
time, we pessimistically use an alignment of 128 bytes based on the
following observations:

* On x86, the cache line is usually 64 bytes. However, on Intel cpus the
  spatial prefetcher "strives to complete every cache line fetched to
  the L2 cache with the pair line that completes it to a 128-byte
  aligned chunk" (section 2.3.5.4 of [1]). So to avoid destructive
  interference we need to align on every pair of cache lines.
* On ARM, both cortex A-15 (armv7 [2]) and cortex A-77 (aarch64 [3])
  have 64-byte data cache lines.  However, Qualcomm Snapdragon CPUs can
  have 128-byte data cache lines [4].  Since Chrome OS code compiled for
  armv7 can still run on aarch64 cpus with 128-byte cache lines assume
  we need 128-byte alignment there as well.

[1]: https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf
[2]: https://developer.arm.com/documentation/ddi0438/d/Level-2-Memory-System/About-the-L2-memory-system
[3]: https://developer.arm.com/documentation/101111/0101/functional-description/level-2-memory-system/about-the-l2-memory-system
[4]: https://www.7-cpu.com/cpu/Snapdragon.html

BUG=none
TEST=unit tests

Change-Id: Iaf6a29ad0d35411c70fd0e833cc6a49eda029bbc
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/2804869
Reviewed-by: Daniel Verkamp <dverkamp@chromium.org>
Tested-by: kokoro <noreply+kokoro@google.com>
Commit-Queue: Chirantan Ekbote <chirantan@chromium.org>
diff --git a/cros_async/src/sync/cv.rs b/cros_async/src/sync/cv.rs
index 96d18aa..be38a2e 100644
--- a/cros_async/src/sync/cv.rs
+++ b/cros_async/src/sync/cv.rs
@@ -52,6 +52,7 @@
 ///     val = block_on(cv.wait(val));
 /// }
 /// ```
+#[repr(align(128))]
 pub struct Condvar {
     state: AtomicUsize,
     waiters: UnsafeCell<WaiterList>,
diff --git a/cros_async/src/sync/mu.rs b/cros_async/src/sync/mu.rs
index 0758737..bb6ea02 100644
--- a/cros_async/src/sync/mu.rs
+++ b/cros_async/src/sync/mu.rs
@@ -679,6 +679,7 @@
 ///
 /// rx.recv().unwrap();
 /// ```
+#[repr(align(128))]
 pub struct Mutex<T: ?Sized> {
     raw: RawMutex,
     value: UnsafeCell<T>,
diff --git a/cros_async/src/sync/spin.rs b/cros_async/src/sync/spin.rs
index 0687bb7..4fb6172 100644
--- a/cros_async/src/sync/spin.rs
+++ b/cros_async/src/sync/spin.rs
@@ -23,6 +23,7 @@
 /// poisoned data if a thread panics while holding the lock. If lock poisoning is needed, it can be
 /// implemented by wrapping the `SpinLock` in a new type that implements poisoning. See the
 /// implementation of `std::sync::Mutex` for an example of how to do this.
+#[repr(align(128))]
 pub struct SpinLock<T: ?Sized> {
     lock: AtomicBool,
     value: UnsafeCell<T>,
diff --git a/cros_async/src/sync/waiter.rs b/cros_async/src/sync/waiter.rs
index 98e26f3..072a0f5 100644
--- a/cros_async/src/sync/waiter.rs
+++ b/cros_async/src/sync/waiter.rs
@@ -18,6 +18,7 @@
 
 // An atomic version of a LinkedListLink. See https://github.com/Amanieu/intrusive-rs/issues/47 for
 // more details.
+#[repr(align(128))]
 pub struct AtomicLink {
     prev: UnsafeCell<Option<NonNull<AtomicLink>>>,
     next: UnsafeCell<Option<NonNull<AtomicLink>>>,