Optimize TransposeOp by using strided access pattern, bulk memory transfer, and other profile-guided optimizations

Summary: Work in progress for improving the performance of the TransposeOp on CPU. This is used extensively for inference in several neural MT systems, so optimizing this function is worthwhile and will reduce request latency.

Differential Revision: D4913075

fbshipit-source-id: fa2742829291d91f3eba00fdfe7d6c0dae83e206
2 files changed
tree: aae8d8587c5144dae11453128291ee4000e5dc41
  1. .travis/
  2. caffe/
  3. caffe2/
  4. cmake/
  5. docs/
  6. scripts/
  7. third_party/
  8. .Doxyfile
  9. .Doxyfile-c
  10. .Doxyfile-python
  11. .gitignore
  12. .gitmodules
  13. .travis.yml
  14. appveyor.yml
  15. CMakeLists.txt
  16. LICENSE
  17. Makefile
  18. PATENTS
  19. README.md
  20. release-notes.md
README.md

Caffe2

Caffe2 is a lightweight, modular, and scalable deep learning framework. Building on the original Caffe, Caffe2 is designed with expression, speed, and modularity in mind.

Questions and Feedback

Please use Github issues (https://github.com/caffe2/caffe2/issues) to ask questions, report bugs, and request new features.

Please participate in our survey (https://www.surveymonkey.com/r/caffe2). We will send you information about new releases and special developer events/webinars.

License and Citation

Caffe2 is released under the BSD 2-Clause license.

Building Caffe2

Travis Build Status

Detailed Build Status

TargetStatus
LinuxBuild Linux
AndroidBuild Android
iOSBuild iOS
Linux + MKLBuild LinuxMKL

Further Resources on Caffe2.ai