Fix Neon routines for big-endian

Where vectors are used in table lookups, lanewise intrinsics or unzips
they have to be loaded using explicit ld1 intrinsics rather than
implicit loads.

Also rename PL exp table to avoid conflicts with libc.
20 files changed