improve style + a bit of perf for ScatterWeightedSum CUDA

Summary: For perf, it is better to check weight0 inside the kernel and avoid host synchronization when copying to a stack variable. Improved style a bit (github does not have Lint, so contributed code may not conform to our style).

Differential Revision: D5011668

fbshipit-source-id: 1eb85912f6f499acd3190cfcb59e7e39c2220d89
1 file changed
tree: fe17f95a9e64028b9ad978caa6eaae12b783ad78
  1. .travis/
  2. caffe/
  3. caffe2/
  4. cmake/
  5. docs/
  6. scripts/
  7. third_party/
  8. .Doxyfile
  9. .Doxyfile-c
  10. .Doxyfile-python
  11. .gitignore
  12. .gitmodules
  13. .travis.yml
  14. appveyor.yml
  15. CMakeLists.txt
  16. LICENSE
  17. Makefile
  18. PATENTS
  19. README.md
  20. release-notes.md
README.md

Caffe2

Caffe2 is a lightweight, modular, and scalable deep learning framework. Building on the original Caffe, Caffe2 is designed with expression, speed, and modularity in mind.

Questions and Feedback

Please use Github issues (https://github.com/caffe2/caffe2/issues) to ask questions, report bugs, and request new features.

Please participate in our survey (https://www.surveymonkey.com/r/caffe2). We will send you information about new releases and special developer events/webinars.

License and Citation

Caffe2 is released under the BSD 2-Clause license.

Build Status

Travis Build Status Windows Build status

Detailed build matrix (hit refresh if you see icons not showing up due to heroku):

TargetStatus
LinuxBuild Linux
Mac (CPU)Build Mac
AndroidBuild Android
iOSBuild iOS
Linux + MKLBuild LinuxMKL
WindowsBuild status

Further Resources on Caffe2.ai