Fixed increasing CPU overhead of `RemovableHandle.__init__` (#122847) For some reason, if we construct `class Handle(RemovableHandle` inside `register_multi_grad_hook`, then over time, the call to `RemovableHandle.__init__` slows down more and more (when we have GC disabled). Perhaps, this is related to the class attribute `next_id: int = 0`. Python experts: please let me know if you have thoughts 😅 I am open to any suggestions on if how we should deal with this `Handle` class. For now, I changed it to a private `_MultiHandle`. <details> <summary> Experiment Script </summary> ``` import gc import time import torch NUM_TENSORS = int(5e4) ts = [torch.empty(1, requires_grad=True) for _ in range(NUM_TENSORS)] def hook(grad) -> None: return gc.disable() times = [] for i, t in enumerate(ts): start_time = time.time() torch.autograd.graph.register_multi_grad_hook([t], hook) end_time = time.time() times.append(end_time - start_time) print([f"{t * 1e6:.3f} us" for t in times[1:6]]) # print first few times print([f"{t * 1e6:.3f} us" for t in times[-5:]]) # print last few times times = [] for i, t in enumerate(ts): start_time = time.time() t.register_hook(hook) end_time = time.time() times.append(end_time - start_time) print([f"{t * 1e6:.3f} us" for t in times[1:6]]) # print first few times print([f"{t * 1e6:.3f} us" for t in times[-5:]]) # print last few times ``` </details> <details> <summary> Results </summary> Before fix: ``` ['23.603 us', '19.550 us', '15.497 us', '12.875 us', '13.828 us'] ['327.110 us', '341.177 us', '329.733 us', '332.832 us', '341.177 us'] ['318.050 us', '315.189 us', '319.719 us', '311.613 us', '308.990 us'] ['374.317 us', '394.821 us', '350.714 us', '337.362 us', '331.402 us'] ``` Calling `register_multi_grad_hook` makes calling itself and `register_hook` slower (actually, any call to `RemovableHandle.__init__`). After fix: ``` ['13.590 us', '9.060 us', '12.875 us', '7.153 us', '8.583 us'] ['4.530 us', '5.245 us', '6.437 us', '4.768 us', '5.007 us'] ['2.623 us', '1.907 us', '1.431 us', '1.669 us', '1.192 us'] ['1.431 us', '1.431 us', '1.192 us', '1.192 us', '1.431 us'] ``` </details> Update: from @soulitzer > Your suspicion about next_id is right. I think what is happening is that whenever a class attribute is set, it needs to invalidate some cached data for the subclasses one-by-one. https://github.com/python/cpython/blob/eefff682f09394fe4f18b7d7c6ac4c635caadd02/Objects/typeobject.c#L845 And this PR fixes the issue by avoiding creating many subclasses dynamically. Changing next_id to something like List[int] or incrementing a global instead also fixes this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122847 Approved by: https://github.com/soulitzer ghstack dependencies: #122726

commit: 9208df45cb02bafad32ba8444a0a388703a453b8 [log] [tgz]
author: Andrew Gu <andgu@fb.com> Wed Mar 27 16:32:20 2024 -0700
committer: PyTorch MergeBot <pytorchmergebot@users.noreply.github.com> Thu Mar 28 15:24:12 2024 +0000
tree: ba64b28d96d882cca264fac2bf4e4c9f864963b4
parent: 4290a57e9ce0af8b76268825bea32d9c9f4c852a [diff]
diff --git a/torch/autograd/graph.py b/torch/autograd/graph.py
index 2aca5b3..19938c1 100644
--- a/torch/autograd/graph.py
+++ b/torch/autograd/graph.py

@@ -382,6 +382,23 @@
             torch._C._autograd._saved_tensors_hooks_disable(maybe_prev_message)
 
 
+class _MultiHandle(RemovableHandle):
+    handles: Tuple[RemovableHandle, ...]
+
+    def __init__(self, handles: Tuple[RemovableHandle, ...]):
+        self.handles = handles
+
+    def remove(self):
+        for handle in self.handles:
+            handle.remove()
+
+    def __getstate__(self):
+        return self.handles
+
+    def __setstate__(self, state):
+        self.handles = state
+
+
 def register_multi_grad_hook(
     tensors: Sequence[torch.Tensor],
     fn: Union[
@@ -442,22 +459,6 @@
     if mode not in supported_modes:
         raise ValueError(f"Expects mode to be one of {supported_modes} but got {mode}")
 
-    class Handle(RemovableHandle):
-        handles: Tuple[RemovableHandle, ...]
-
-        def __init__(self, handles: Tuple[RemovableHandle, ...]):
-            self.handles = handles
-
-        def remove(self):
-            for handle in self.handles:
-                handle.remove()
-
-        def __getstate__(self):
-            return self.handles
-
-        def __setstate__(self, state):
-            self.handles = state
-
     if mode == "all":
         count: Dict[int, int] = dict()
         nb_calls = None
@@ -516,7 +517,7 @@
             if tensor.requires_grad
         )
 
-    return Handle(handles)  # type: ignore[possibly-undefined]
+    return _MultiHandle(handles)  # type: ignore[possibly-undefined]
 
 
 # NOTE [Allow mutation on tensors saved for backward]
commit	9208df45cb02bafad32ba8444a0a388703a453b8	[log] [tgz]
author	Andrew Gu <andgu@fb.com>	Wed Mar 27 16:32:20 2024 -0700
committer	PyTorch MergeBot <pytorchmergebot@users.noreply.github.com>	Thu Mar 28 15:24:12 2024 +0000
tree	ba64b28d96d882cca264fac2bf4e4c9f864963b4
parent	4290a57e9ce0af8b76268825bea32d9c9f4c852a [diff]