threaded RNN executor for CPU, multi-stream executor CUDA

Summary:
Special executor for RNNs which can exploit parallelism over timesteps. For CPU we use multi-threading, achiving 3x or so improved on 4-layers LSTMs.
With CUDA, perf improvements are more modest, but the structure allows for optimizing it further. For CUDA, we use multiple streams and events if there is parallellism
over timesteps. In my experiments, it was not good to use more than 2 streams, though.

Flag --caffe2_rnn_executor can be used to switch the executor off.

Reviewed By: salexspb

Differential Revision: D5749304

fbshipit-source-id: d6f76b3e16598be5b4e8188aff031671ebafaa4c
15 files changed
tree: a1073c4e2f447009bba10c75fa36d8f2a49e7450
  1. .travis/
  2. caffe/
  3. caffe2/
  4. cmake/
  5. conda/
  6. docker/
  7. docs/
  8. scripts/
  9. third_party/
  10. .Doxyfile
  11. .Doxyfile-c
  12. .Doxyfile-python
  13. .gitignore
  14. .gitmodules
  15. .travis.yml
  16. appveyor.yml
  17. CMakeLists.txt
  18. LICENSE
  19. Makefile
  20. PATENTS
  21. README.md
  22. release-notes.md
README.md

Caffe2

TravisCI Build Status Appveyor Build Status

Caffe2 is a lightweight, modular, and scalable deep learning framework. Building on the original Caffe, Caffe2 is designed with expression, speed, and modularity in mind.

News and Events

Caffe2 research award competition request for proposals

Questions and Feedback

Please use Github issues (https://github.com/caffe2/caffe2/issues) to ask questions, report bugs, and request new features.

Please participate in our survey (https://www.surveymonkey.com/r/caffe2). We will send you information about new releases and special developer events/webinars.

License and Citation

Caffe2 is released under the BSD 2-Clause license.

Further Resources on Caffe2.ai