Optimizations on the loopfilters.

- Scheduling for Atom processors
- Combining of macros to allow for better interleaving
- Change from multiplies to adds for main filter
- Use of movhps/movlps to fill xmm registers without
  shifting and orring

Change-Id: I0b3500a5f58abf7085253ec92d64c8a96723040b
1 file changed