Implement fp to bits methods as intrinsics.

Rationale:
Better optimization, better performance.

Results on libcore benchmark:

Most gain is from moving the invariant call out of the loop
after we detect everything is a side-effect free intrinsic.
But generated code in general case is much cleaner too.

Before:
timeFloatToIntBits() in 181 ms.
timeFloatToRawIntBits() in 35 ms.
timeDoubleToLongBits() in 208 ms.
timeDoubleToRawLongBits() in 35 ms.

After:
timeFloatToIntBits() in 36 ms.
timeFloatToRawIntBits() in 35 ms.
timeDoubleToLongBits() in 35 ms.
timeDoubleToRawLongBits() in 34 ms.

bug=11548336

Change-Id: I6e001bd3708e800bd75a82b8950fb3a0fc01766e
15 files changed