allow querying tensor device + tool to validate that all ops have tensors from correct devices (GPUs)

Summary:
Quite common, hard-to-debug, performance bug for multi-GPU training has been that operators have been passed tensors that reside on different GPU than what the op runs on. Since we have peer access enabled, this works, but is just much slower. With data parallel model this problem arises rarely as it has static analysis of the operators, but if someone bypassed DPM or uses FeedBlob with incorrect device options, this problem can happen.

To make debugging easier, I added device-field to tensor that stores the device information that allocated the memory. In addition, I added a function to go through operator inputs and outputs and compare their tensor device to the operator device. This check is run after first iteration with prof_dag only.

Also renamed ShapeCall to TensorInfoFun, as it now returns so much other info than the shape.

I think this is pretty safe diff, but do you find it problematic to add a new field to tensor?

Reviewed By: dzhulgakov

Differential Revision: D5335505

fbshipit-source-id: 511b6c122dff9a205f43951984868ffd40f7ac30
12 files changed
tree: eeb9769acdddf973220eec87436fbd116e630c62
  1. .travis/
  2. caffe/
  3. caffe2/
  4. cmake/
  5. docs/
  6. scripts/
  7. third_party/
  8. .Doxyfile
  9. .Doxyfile-c
  10. .Doxyfile-python
  11. .gitignore
  12. .gitmodules
  13. .travis.yml
  14. appveyor.yml
  15. CMakeLists.txt
  16. LICENSE
  17. Makefile
  18. PATENTS
  19. README.md
  20. release-notes.md
README.md

Caffe2

TravisCI Build Status Appveyor Build Status

Caffe2 is a lightweight, modular, and scalable deep learning framework. Building on the original Caffe, Caffe2 is designed with expression, speed, and modularity in mind.

News and Events

Caffe2 research award competition request for proposals

Questions and Feedback

Please use Github issues (https://github.com/caffe2/caffe2/issues) to ask questions, report bugs, and request new features.

Please participate in our survey (https://www.surveymonkey.com/r/caffe2). We will send you information about new releases and special developer events/webinars.

License and Citation

Caffe2 is released under the BSD 2-Clause license.

Further Resources on Caffe2.ai