Revert "Revert "Lose the hand-written futex assembler.""

The problem with the original patch was that using syscall(3) means that
errno can be set, but pthread_create(3) was abusing the TLS errno slot as
a pthread_mutex_t for the thread startup handshake.

There was also a mistake in the check for syscall failures --- it should
have checked against -1 instead of 0 (not just because that's the default
idiom, but also here because futex(2) can legitimately return values > 0).

This patch stops abusing the TLS errno slot and adds a pthread_mutex_t to
pthread_internal_t instead. (Note that for LP64 sizeof(pthread_mutex_t) >
sizeof(uintptr_t), so we could potentially clobber other TLS slots too.)

I've also rewritten the LP32 compatibility stubs to directly reuse the
code from the .h file.

This reverts commit 75c55ff84ebfa686c7ae2cc8ee431c6a33bd46b4.

Bug: 15195455
Change-Id: I6ffb13e5cf6a35d8f59f692d94192aae9ab4593d
24 files changed