[caffe2/FC DNNLOWP] Shrink Y_int32_ vector capacity when appropriate Summary: The FullyConnectedDNNLowPOp::Y_int32_ vectors consume between 1GB and 2GB on one of FB's larger applications. By adding tracing I noticed that the number of elements in each instance oscillates wildy over time. As the buffer backing a vector can only be extended in a resize operation, this means there is wasted memory space. So as a simple optimization, I added code to right-size the buffer backing the vector when the number of elements is less than half the vector capacity at that point; this doesn't affect the existing elements. There is of course a memory/cpu tradeoff here - with the change we are doing more mallocs and frees. I added tracing to measure how many times we grow or shrink per second: it's about 100 per second on average, which is not a great deal. Test Plan: Memory growth impact: over 24 hours and after the startup period, the memory consumed by this code grows from 0.85GB to 1.20GB vs 0.95GB to 1.75GB in the baseline. [ source: https://fburl.com/scuba/heap_profiles/wm47kpfe ] https://pxl.cl/1pHlJ Reviewed By: jspark1105 Differential Revision: D24592098 fbshipit-source-id: 7892b35f24e42403653a74a1a9d06cbc7ee866b9

commit: dd95bf65b6a6c97455efdcfce0aca46e493db916 [log] [tgz]
author: Blaise Sanouillet <blez@fb.com> Thu Oct 29 11:16:13 2020 -0700
committer: Facebook GitHub Bot <facebook-github-bot@users.noreply.github.com> Thu Oct 29 11:19:45 2020 -0700
tree: 7122c0e5456820552c539d6f95e29a3681d743bf
parent: 38265acfbece4c9c3fce8a8ff031532115343c8a [diff]
diff --git a/caffe2/quantization/server/fully_connected_dnnlowp_op.cc b/caffe2/quantization/server/fully_connected_dnnlowp_op.cc
index c7e6804..4a5a6e6 100644
--- a/caffe2/quantization/server/fully_connected_dnnlowp_op.cc
+++ b/caffe2/quantization/server/fully_connected_dnnlowp_op.cc

@@ -190,6 +190,9 @@
 
     if (!dequantize_output_) {
       Y_int32_.resize(Y->size());
+      if (Y_int32_.size() < Y_int32_.capacity() / 2) {
+        Y_int32_.shrink_to_fit();
+      }
       DoNothing<> doNothingObj{};
 
       if (quantize_channelwise_ || filter_qparams_[0].zero_point) {
@@ -443,6 +446,9 @@
 #endif
 
     Y_int32_.resize(Y->size());
+    if (Y_int32_.size() < Y_int32_.capacity() / 2) {
+      Y_int32_.shrink_to_fit();
+    }
     for (int i = 0; i < M; ++i) {
       for (int j = 0; j < N; ++j) {
         int32_t sum = 0;
commit	dd95bf65b6a6c97455efdcfce0aca46e493db916	[log] [tgz]
author	Blaise Sanouillet <blez@fb.com>	Thu Oct 29 11:16:13 2020 -0700
committer	Facebook GitHub Bot <facebook-github-bot@users.noreply.github.com>	Thu Oct 29 11:19:45 2020 -0700
tree	7122c0e5456820552c539d6f95e29a3681d743bf
parent	38265acfbece4c9c3fce8a8ff031532115343c8a [diff]