code_gen_lib: Improve floating-point code gen

This change replaces manual NaN boxing of float32 arguments, which
involved two moves in memory, with the optimized version from
MacroAssembler.  This optimized version does NaN boxing in-place in the
XMM register and then moves to memory once.

This change also uses vmov to copy floating-point arguments and results
when the host supports AVX.

Test: berberis_all
Bug: 282063730
Change-Id: I90cf2d0f90d00027dd2ac0db54e16a4c7b0a53f3
2 files changed