commit | 21dc425e077accd5cae5c6345cd4b7a11e1beecc | [log] [tgz] |
---|---|---|
author | James Reed <jamesreed@fb.com> | Fri Jun 16 16:53:48 2017 -0700 |
committer | Facebook Github Bot <facebook-github-bot@users.noreply.github.com> | Fri Jun 16 17:03:38 2017 -0700 |
tree | b059af9ba0756fa757dc1a49aa95904020629167 | |
parent | 12094b5114002d39b189c355aa68fefcc8eb4c62 [diff] |
Optimize SumSqrElementsOp for CUDA Summary: The old version used one block with 128 threads. Throughput was too low for the NMT use case (calculating squared gradient norms for every parameter), so this increases the throughput. Shaves 7% off CNN model training time per step Reviewed By: wickedfoo Differential Revision: D5263748 fbshipit-source-id: adc3bacd11e49ea00c60381d613d993050e899be
Caffe2 is a lightweight, modular, and scalable deep learning framework. Building on the original Caffe, Caffe2 is designed with expression, speed, and modularity in mind.
Caffe2 Bay Area Meetup at NVIDIA, May 31 6-8:30, Santa Clara, CA: https://www.meetup.com/Caffe2-Bay-Area/events/239836290/
Caffe2 Community Facebook Group: join to ask questions, talk to other users, and keep informed of important Caffe2 updates.
Please use Github issues (https://github.com/caffe2/caffe2/issues) to ask questions, report bugs, and request new features.
Please participate in our survey (https://www.surveymonkey.com/r/caffe2). We will send you information about new releases and special developer events/webinars.
Caffe2 is released under the BSD 2-Clause license.