libc: provide atomic operations will full barriers for NDK apps.

__atomic_cmpxchg and other related atomic operations did not
provide memory barriers, which can be a problem for non-platform
code that links against them when it runs on multi-core devices.

This patch does two things to fix this:

- It modifies the existing implementation of the functions
  that are exported by the C library to always provide
  full memory barriers. We need to keep them exported by
  the C library to prevent breaking existing application
  machine code.

- It also modifies <sys/atomics.h> to only export
  always-inlined versions of the functions, to ensure that
  any application code compiled against the new header will
  not rely on the platform version of the functions.

  This ensure that said machine code will run properly on
  all multi-core devices.

This is based on the GCC built-in sync primitives.

The end result should be only slightly slower than the
previous implementation.

Note that the platform code does not use these functions
at all. A previous patch completely removed their usage in
the pthread and libstdc++ code.

+ rename arch-arm/bionic/atomics_arm.S to futex_arm.S
+ rename arch-x86/bionic/atomics_x86.S to futex_x86.S
+ remove arch-x86/include/sys/atomics.h which already
  provided inlined functions to the x86 platform.

Change-Id: I752a594475090cf37fa926bb38209c2175dda539
7 files changed