commit | d38499f727898ed7df75972d7cfc2e24263a0f86 | [log] [tgz] |
---|---|---|
author | Aapo Kyrola <akyrola@fb.com> | Tue Dec 13 11:22:11 2016 -0800 |
committer | Bram Wasti <bwasti@dev11999.prn1.facebook.com> | Thu Dec 15 12:01:30 2016 -0800 |
tree | dcd150837c830f9d9aa44515a44538e8865c4ab4 | |
parent | 11a6f48fe715b60efa9b26c29e557f6d9c1a60bb [diff] |
Optimize BlobIsDefined() + benchmark --> net construction 95 secs to 8.2 secs! Summary: I have noticed that constructing the Xray model takes quite a while. To measure this, I wrote a benchmark script that creates a resnet-50 model on 8 gpus. This takes about 95 secs -- which is kind of annoying when you want to quickly debug stuff. Profiling (using Python's cProfile), I was able to see that the most of the time is used in net.BlobIsDefined(), which does a linear search over external inputs and operator outputs. Thus it gets slower and slower with large nets. This can be fully optimized by keeping a separate lookup table of operator inputs and outputs (and external inputs and outputs). It is a bit annoying to keep this separate data structure, but I setup the unit tests to ensure things are doing correctly over Clones. After the optimization, the net construction drops from 95 secs to 8.2 secs! Reviewed By: azzolini Differential Revision: D4288307 fbshipit-source-id: 0bb82c8bde9d86a2702b298f4aa706cba509346e
Caffe2 is a deep learning framework made with expression, speed, and modularity in mind. It is an experimental refactoring of Caffe, and allows a more flexible way to organize computation.
Read the installation instructions for installation details.
Caffe2 is released under the BSD 2-Clause license.