pass learning rate scaling factor to parameter update builder function

Summary:
When refactoring data parallel model, the division of LR by number of devices was dropped, and thus we ended up effectively multiplying gradients by the number of devices. Thus, we need to scale the LR by 1/numgpus.

Created a test to confirm that data_parallel_model produces exactly same results on different number of gpus, given the total batch size.

Reviewed By: prigoyal

Differential Revision: D4248907

fbshipit-source-id: af21ede113e6ac25f12c556de298cb18974548be
4 files changed
tree: 0d8c61aeb90f2e71094394d37a189e2c07fa453e
  1. caffe/
  2. caffe2/
  3. docs/
  4. third_party/
  5. .Doxyfile
  6. .gitignore
  7. .gitmodules
  8. build.py
  9. build_android.py
  10. build_android_prepare.py
  11. LICENSE
  12. Makefile
  13. README.md
README.md

Caffe2

Caffe2 is a deep learning framework made with expression, speed, and modularity in mind. It is an experimental refactoring of Caffe, and allows a more flexible way to organize computation.

Read the installation instructions for installation details.

License and Citation

Caffe2 is released under the BSD 2-Clause license.