commit	d38499f727898ed7df75972d7cfc2e24263a0f86	[log] [tgz]
author	Aapo Kyrola <akyrola@fb.com>	Tue Dec 13 11:22:11 2016 -0800
committer	Bram Wasti <bwasti@dev11999.prn1.facebook.com>	Thu Dec 15 12:01:30 2016 -0800
tree	dcd150837c830f9d9aa44515a44538e8865c4ab4
parent	11a6f48fe715b60efa9b26c29e557f6d9c1a60bb [diff]

Optimize BlobIsDefined() + benchmark --> net construction 95 secs to 8.2 secs! Summary: I have noticed that constructing the Xray model takes quite a while. To measure this, I wrote a benchmark script that creates a resnet-50 model on 8 gpus. This takes about 95 secs -- which is kind of annoying when you want to quickly debug stuff. Profiling (using Python's cProfile), I was able to see that the most of the time is used in net.BlobIsDefined(), which does a linear search over external inputs and operator outputs. Thus it gets slower and slower with large nets. This can be fully optimized by keeping a separate lookup table of operator inputs and outputs (and external inputs and outputs). It is a bit annoying to keep this separate data structure, but I setup the unit tests to ensure things are doing correctly over Clones. After the optimization, the net construction drops from 95 secs to 8.2 secs! Reviewed By: azzolini Differential Revision: D4288307 fbshipit-source-id: 0bb82c8bde9d86a2702b298f4aa706cba509346e

tree: dcd150837c830f9d9aa44515a44538e8865c4ab4

README.md

Caffe2

Caffe2 is a deep learning framework made with expression, speed, and modularity in mind. It is an experimental refactoring of Caffe, and allows a more flexible way to organize computation.

Read the installation instructions for installation details.

License and Citation

Caffe2 is released under the BSD 2-Clause license.