X86: Use locked add rather than mfence
Java semantics for memory ordering can be satisfied using
lock addl $0,0(SP)
rather than mfence. The locked add synchronizes the memory caches, but
doesn't affect device memory.
Timing on a micro benchmark with a mfence or lock add $0,0(sp) in a loop
with 600000000 iterations:
Implement this as an instruction-set-feature lock_add. This is off by
default (uses mfence), and enabled for atom & silvermont variants.
Generation of mfence can be forced by a parameter to MemoryFence.
Signed-off-by: Mark Mendell <email@example.com>
14 files changed