arm64: implement FSQRT 2d_2d, 4s_4s, 2s_2s
AFAICS this completes the AArch64 SIMD implementation, except for the
crypto instructions.

This changes the type of Iop_Sqrt64x2 and Iop_Sqrt32x4 so as to take a
rounding mode argument.  This will (temporarily, of course) break all
of the other targets that implement vector fsqrt.

git-svn-id: svn:// 8f6e269a-dfd6-0310-a8e1-e2731360e62c
6 files changed