sync: Align structs to cache lines Updating an atomic value invalidates the entire cache line to which it belongs, which can make the next access to that cache line slower on other CPU cores. This can lead to "destructive interference" or "false sharing", where atomic operations on two or more unrelated values on the same cache line cause hardware interference with each other, reducing overall performance. Deal with this by aligning atomic primitives to the cache line width so that two primitives are not placed on the same cache line. This also has the benefit of causing *constructive* interference between the atomic value and the data it protects. Since the user of the atomic primitive likely wants to access the protected data after acquiring access, having them both on the same cache line makes the subsequent access to the data faster. A common pattern for synchronization primitives is to put them inside an Arc. However, if the primitive did not specify cache line alignment then both the atomic reference count and the atomic state could end up on the same cache line. In this case, changing the reference count of the primitive would cause destructive interference with its operation. With the proper alignment, both the atomic state and the reference count end up on different cache lines so there would be no interference between them. Since we can't query the cache line width of the target machine at build time, we pessimistically use an alignment of 128 bytes based on the following observations: * On x86, the cache line is usually 64 bytes. However, on Intel cpus the spatial prefetcher "strives to complete every cache line fetched to the L2 cache with the pair line that completes it to a 128-byte aligned chunk" (section 2.3.5.4 of [1]). So to avoid destructive interference we need to align on every pair of cache lines. * On ARM, both cortex A-15 (armv7 [2]) and cortex A-77 (aarch64 [3]) have 64-byte data cache lines. However, Qualcomm Snapdragon CPUs can have 128-byte data cache lines [4]. Since Chrome OS code compiled for armv7 can still run on aarch64 cpus with 128-byte cache lines assume we need 128-byte alignment there as well. [1]: https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf [2]: https://developer.arm.com/documentation/ddi0438/d/Level-2-Memory-System/About-the-L2-memory-system [3]: https://developer.arm.com/documentation/101111/0101/functional-description/level-2-memory-system/about-the-l2-memory-system [4]: https://www.7-cpu.com/cpu/Snapdragon.html BUG=none TEST=unit tests Change-Id: Iaf6a29ad0d35411c70fd0e833cc6a49eda029bbc Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/2804869 Reviewed-by: Daniel Verkamp <dverkamp@chromium.org> Tested-by: kokoro <noreply+kokoro@google.com> Commit-Queue: Chirantan Ekbote <chirantan@chromium.org>

commit: e425f57d5bfc3c1344156144ba67161b6291f7b5 [log] [tgz]
author: Chirantan Ekbote <chirantan@chromium.org> Fri Apr 02 21:35:45 2021 +0900
committer: Commit Bot <commit-bot@chromium.org> Tue Apr 06 09:20:25 2021 +0000
tree: f1c87dc4f46c3df4945b7f1592a59cb9cf40edff
parent: 1a3dadca93a7ecf353ffc3c52dad0e377d2a586d [diff]
diff --git a/cros_async/src/sync/cv.rs b/cros_async/src/sync/cv.rs
index 96d18aa..be38a2e 100644
--- a/cros_async/src/sync/cv.rs
+++ b/cros_async/src/sync/cv.rs

@@ -52,6 +52,7 @@
 ///     val = block_on(cv.wait(val));
 /// }
 /// ```
+#[repr(align(128))]
 pub struct Condvar {
     state: AtomicUsize,
     waiters: UnsafeCell<WaiterList>,

diff --git a/cros_async/src/sync/mu.rs b/cros_async/src/sync/mu.rs
index 0758737..bb6ea02 100644
--- a/cros_async/src/sync/mu.rs
+++ b/cros_async/src/sync/mu.rs

@@ -679,6 +679,7 @@
 ///
 /// rx.recv().unwrap();
 /// ```
+#[repr(align(128))]
 pub struct Mutex<T: ?Sized> {
     raw: RawMutex,
     value: UnsafeCell<T>,

diff --git a/cros_async/src/sync/spin.rs b/cros_async/src/sync/spin.rs
index 0687bb7..4fb6172 100644
--- a/cros_async/src/sync/spin.rs
+++ b/cros_async/src/sync/spin.rs

@@ -23,6 +23,7 @@
 /// poisoned data if a thread panics while holding the lock. If lock poisoning is needed, it can be
 /// implemented by wrapping the `SpinLock` in a new type that implements poisoning. See the
 /// implementation of `std::sync::Mutex` for an example of how to do this.
+#[repr(align(128))]
 pub struct SpinLock<T: ?Sized> {
     lock: AtomicBool,
     value: UnsafeCell<T>,

diff --git a/cros_async/src/sync/waiter.rs b/cros_async/src/sync/waiter.rs
index 98e26f3..072a0f5 100644
--- a/cros_async/src/sync/waiter.rs
+++ b/cros_async/src/sync/waiter.rs

@@ -18,6 +18,7 @@
 
 // An atomic version of a LinkedListLink. See https://github.com/Amanieu/intrusive-rs/issues/47 for
 // more details.
+#[repr(align(128))]
 pub struct AtomicLink {
     prev: UnsafeCell<Option<NonNull<AtomicLink>>>,
     next: UnsafeCell<Option<NonNull<AtomicLink>>>,
commit	e425f57d5bfc3c1344156144ba67161b6291f7b5	[log] [tgz]
author	Chirantan Ekbote <chirantan@chromium.org>	Fri Apr 02 21:35:45 2021 +0900
committer	Commit Bot <commit-bot@chromium.org>	Tue Apr 06 09:20:25 2021 +0000
tree	f1c87dc4f46c3df4945b7f1592a59cb9cf40edff
parent	1a3dadca93a7ecf353ffc3c52dad0e377d2a586d [diff]