Scaled training and fetching from the PS

Summary:
Today, the PS's weirdly store the entire embedding and not just their
subsection of it. This was simply an oversight on the part of the original
author and this diff fixes that.

The sparse params are sharded to the PS's and the PS's just store their section
of the embedding. The trainer requests the id's as is from the PS. But the PS
divides the id by the num_of_shards before looking it up in the emdedding table
blob.  This happens on the backward and the forward pass. However, during the
model download part, the PS multiples the embeddings with the num_of_shards
before returning them to the trainer. The upshot is that the trainer does not
know anything about how the embeddings are scaled on the PS. The PS adds extra
divide and multiply steps to achieve that.

2. During estimation time, we allocate just one PS for estimation. So in order
to make all of the embeddings fit on the single PS: We simply additionally
scale the hash table sizes (proportionally and equally for all the sparse
params) such that it fits. This scaling is handled analogously to (1).

Reviewed By: boryiingsu

Differential Revision: D5664093

fbshipit-source-id: 92f501f61566f939c41ce0b614a1b499669f978a
1 file changed
tree: 24fff90dad37345d3807266cb6f82445893bc1c7
  1. .travis/
  2. caffe/
  3. caffe2/
  4. cmake/
  5. conda/
  6. docker/
  7. docs/
  8. scripts/
  9. third_party/
  10. .Doxyfile
  11. .Doxyfile-c
  12. .Doxyfile-python
  13. .gitignore
  14. .gitmodules
  15. .travis.yml
  16. appveyor.yml
  17. cmake_uninstall.cmake.in
  18. CMakeLists.txt
  19. LICENSE
  20. Makefile
  21. PATENTS
  22. README.md
  23. release-notes.md
README.md

Caffe2

TravisCI Build Status Appveyor Build Status

Caffe2 is a lightweight, modular, and scalable deep learning framework. Building on the original Caffe, Caffe2 is designed with expression, speed, and modularity in mind.

News and Events

Caffe2 research award competition request for proposals

Questions and Feedback

Please use Github issues (https://github.com/caffe2/caffe2/issues) to ask questions, report bugs, and request new features.

Please participate in our survey (https://www.surveymonkey.com/r/caffe2). We will send you information about new releases and special developer events/webinars.

License and Citation

Caffe2 is released under the BSD 2-Clause license.

Further Resources on Caffe2.ai