MIPS32: Improve method entry/exit code

Improvements:
- the stack frame is (de)allocated in one step instead of two
- callee-saved FPU registers are 8-byte aligned within the frame,
  allowing a single ldc1/sdc1 instruction to load/store an FPU
  register without causing exceptions due to misaligned accesses
- the return address register, RA, is restored early for better
  instruction scheduling

Change-Id: I556b139c62839490a9fdbce8c5e6e3e2d1cc7bb7
3 files changed