Optimizing: Replace x86 xchg use with xor sequence
On some x86 processors, xchg is serializing even when exchanging two
registers. Replace the xchgl use with the 3 xor sequence to swap to
registers. This is generally faster and doesn't serialize the machine.
Change-Id: Iea2cd993d3b70a103bbdd1dbf7818e26ae29387c
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
diff --git a/compiler/optimizing/code_generator_x86.cc b/compiler/optimizing/code_generator_x86.cc
index e15eff9..676b842 100644
--- a/compiler/optimizing/code_generator_x86.cc
+++ b/compiler/optimizing/code_generator_x86.cc
@@ -4535,7 +4535,11 @@
Location destination = move->GetDestination();
if (source.IsRegister() && destination.IsRegister()) {
- __ xchgl(destination.AsRegister<Register>(), source.AsRegister<Register>());
+ // Use XOR swap algorithm to avoid serializing XCHG instruction or using a temporary.
+ DCHECK_NE(destination.AsRegister<Register>(), source.AsRegister<Register>());
+ __ xorl(destination.AsRegister<Register>(), source.AsRegister<Register>());
+ __ xorl(source.AsRegister<Register>(), destination.AsRegister<Register>());
+ __ xorl(destination.AsRegister<Register>(), source.AsRegister<Register>());
} else if (source.IsRegister() && destination.IsStackSlot()) {
Exchange(source.AsRegister<Register>(), destination.GetStackIndex());
} else if (source.IsStackSlot() && destination.IsRegister()) {