commit | a96184a887e02c6f0ff3cf4b0ec2faad831c1ba2 | [log] [tgz] |
---|---|---|
author | Benoit Jacob <benoitjacob@google.com> | Fri Sep 27 16:04:50 2019 -0400 |
committer | Benoit Jacob <benoitjacob@google.com> | Tue Mar 10 16:38:42 2020 -0400 |
tree | 78cffe347d733a2c6fbe54703eb18ef5a5c2d152 | |
parent | 02f886b973891648ee380a8ddf1b25bbb7c91e6e [diff] |
Rewrite RUY_CHECK family of macros: - Drop unwanted dependency on TFLite macros (prereq ahead of future move out of tflite). - There doesn't seem to be a compelling flavor of Google logging macros that we could use here, without adding a large dependency and/or a large increase to binary size. We only need the most basic assertion functionality, this implementation achieves minimal binary size impact by using only snprintf, fprintf and abort. - Still have decent logging of compared values, and support for C++11 enum classes for now by logging numerical values (will be possible to improve when C++20 reflection becomes available). Also bump the threshold for the found_distinct_values check which was being flaky (could give false negatives based on pseudorandom values). PiperOrigin-RevId: 271631262
ruy is a matrix multiplication library. Its focus is to cover the matrix multiplication needs of TensorFlow Lite.
ruy supports both floating-point (like Eigen) and quantized (like gemmlowp).
ruy is very new, immature code. It has quite good test coverage, but the code is in flux, lacks comments, needs more cleanup, and there are no design docs at the moment.
We hope to improve on all that and integrate ruy into TensorFlow Lite, at first as a non-default path for ARM A64 only, over the next few weeks [April 2019].
ruy is designed to achieve maximal performance not just on very large sizes, as is the focus of many established libraries, but on whatever are the actual sizes and shapes of matrices most critical in current TensorFlow Lite applications. This often means quite small sizes, e.g. 100x100 or even 50x50, and all sorts of rectangular shapes.
ruy is currently only optimized for ARM A64; other architectures have only slow reference code at the moment.
ruy is currently optimized only for the following combination of storage orders: LHS = row-major, RHS = column-major, destination = column-major. All other combinations of storage orders fall back to slow reference code at the moment.