fix incorrect __hwcap seen in dynamic-linked __set_thread_area

the bug fixed in commit b82cd6c78d812d38c31febba5a9e57dbaa7919c4 was
mostly masked on arm because __hwcap was zero at the point of the call
from the dynamic linker to __set_thread_area, causing the access to
libc.auxv to be skipped and kuser_helper versions of TLS access and
atomics to be used instead of the armv6 or v7 versions. however, on
kernels with kuser_helper removed for hardening it would crash.

since __set_thread_area potentially uses __hwcap, it must be
initialized before the function is called. move the AT_HWCAP lookup
from stage 3 to stage 2b.
diff --git a/ldso/dynlink.c b/ldso/dynlink.c
index a18461e..afec985 100644
--- a/ldso/dynlink.c
+++ b/ldso/dynlink.c
@@ -1670,6 +1670,7 @@
 	/* Setup early thread pointer in builtin_tls for ldso/libc itself to
 	 * use during dynamic linking. If possible it will also serve as the
 	 * thread pointer at runtime. */
+	search_vec(auxv, &__hwcap, AT_HWCAP);
 	libc.auxv = auxv;
 	libc.tls_size = sizeof builtin_tls;
 	libc.tls_align = tls_align;
@@ -1704,7 +1705,6 @@
 	 * global data that may be needed before we can make syscalls. */
 	__environ = envp;
 	decode_vec(auxv, aux, AUX_CNT);
-	__hwcap = aux[AT_HWCAP];
 	search_vec(auxv, &__sysinfo, AT_SYSINFO);
 	__pthread_self()->sysinfo = __sysinfo;
 	libc.page_size = aux[AT_PAGESZ];