Each primitive kind now spills to different locations.

Having different slots depending on the types greatly simplifies
the parallel move resolver. It also avoids doing FPU <-> Core
register swaps, and force backends to implement such a swap.

Change-Id: Ide9f0452e7ccf9efb8adddbcc246d44b937b253c
5 files changed