| commit | ab42a95b6fe5aaf8be92df75c0a608d06e5fc6ba | [log] [tgz] | 
|---|---|---|
| author | Aapo Kyrola <akyrola@fb.com> | Wed Aug 02 10:54:54 2017 -0700 | 
| committer | Facebook Github Bot <facebook-github-bot@users.noreply.github.com> | Wed Aug 02 11:10:10 2017 -0700 | 
| tree | 073a849a1e627dbaa798db610ef5ad1363cb4991 | |
| parent | 0fc2bf26b45d25030d696ae8701c1dfbec2cc9d3 [diff] | 
fast path for CUDNN global average pooling Summary: KaimingHe debugged slow model, and found out that global average pooling was hideously slow, even with CUDNN. Turns out CUDNN pooling op (especially backward pass) is not optimized for global pooling. This adds a fast path for global average pooling with NCHW. This is about 30x faster than CUDNN with 56 x 56 pooling, Compared to equivalent ReduceBackSum, this is about 3x faster. I will bootcamp the max pooling. Reviewed By: asaadaldien Differential Revision: D5533059 fbshipit-source-id: 2d590693d737fa92184603663031d96f6145f304
Caffe2 is a lightweight, modular, and scalable deep learning framework. Building on the original Caffe, Caffe2 is designed with expression, speed, and modularity in mind.
Caffe2 research award competition request for proposals
Please use Github issues (https://github.com/caffe2/caffe2/issues) to ask questions, report bugs, and request new features.
Please participate in our survey (https://www.surveymonkey.com/r/caffe2). We will send you information about new releases and special developer events/webinars.
Caffe2 is released under the BSD 2-Clause license.