Optimizing: Replace x86 xchg use with xor sequence

On some x86 processors, xchg is serializing even when exchanging two
registers.  Replace the xchgl use with the 3 xor sequence to swap to
registers.  This is generally faster and doesn't serialize the machine.

Change-Id: Iea2cd993d3b70a103bbdd1dbf7818e26ae29387c
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
diff --git a/compiler/optimizing/code_generator_x86.cc b/compiler/optimizing/code_generator_x86.cc
index e15eff9..676b842 100644
--- a/compiler/optimizing/code_generator_x86.cc
+++ b/compiler/optimizing/code_generator_x86.cc
@@ -4535,7 +4535,11 @@
   Location destination = move->GetDestination();
 
   if (source.IsRegister() && destination.IsRegister()) {
-    __ xchgl(destination.AsRegister<Register>(), source.AsRegister<Register>());
+    // Use XOR swap algorithm to avoid serializing XCHG instruction or using a temporary.
+    DCHECK_NE(destination.AsRegister<Register>(), source.AsRegister<Register>());
+    __ xorl(destination.AsRegister<Register>(), source.AsRegister<Register>());
+    __ xorl(source.AsRegister<Register>(), destination.AsRegister<Register>());
+    __ xorl(destination.AsRegister<Register>(), source.AsRegister<Register>());
   } else if (source.IsRegister() && destination.IsStackSlot()) {
     Exchange(source.AsRegister<Register>(), destination.GetStackIndex());
   } else if (source.IsStackSlot() && destination.IsRegister()) {