Rebase gemmlowp to e96c3a9

  - Better multi-thread perf
  - API change to match standard GEMM: C=A*B rather than C=B*A

Change-Id: I74159fcb246d2a1fc246015e221306bbe11ea8e3
52 files changed