Improved the matrix multiplication blocking in the case where mr is not a power of 2 (e.g on Haswell CPUs).
1 file changed