Improved DDP checkpoint documentation (#106985) Amended the documentation for the specified case. Fixes #84589 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106985 Approved by: https://github.com/wanchaol, https://github.com/fduwjj

commit: 6275f9165478de67b9e1548da48546078ea308c0 [log] [tgz]
author: Andrei Gheorghe <andrei@dharmaventures.co> Mon Sep 25 22:54:20 2023 +0000
committer: PyTorch MergeBot <pytorchmergebot@users.noreply.github.com> Mon Sep 25 22:54:24 2023 +0000
tree: 65bc855c367b1e2426e9c5ba0ed0f1aea154eae4
parent: 7ed06e8317d6f8ac5e339d1b354cd8632ea2613d [diff]
diff --git a/torch/nn/parallel/distributed.py b/torch/nn/parallel/distributed.py
index 24f63e1..5a6facf 100644
--- a/torch/nn/parallel/distributed.py
+++ b/torch/nn/parallel/distributed.py

@@ -456,9 +456,12 @@
 
     .. note::
         DistributedDataParallel currently offers limited support for gradient
-        checkpointing with :meth:`torch.utils.checkpoint`. DDP will work as
-        expected when there are no unused parameters in the model and each layer
-        is checkpointed at most once (make sure you are not passing
+        checkpointing with :meth:`torch.utils.checkpoint`.
+        If the checkpoint is done with use_reentrant=False (recommended), DDP
+        will work as expected without any limitations.
+        If, however, the checkpoint is done with use_reentrant=True (the default),
+        DDP will work as expected when there are no unused parameters in the model
+        and each layer is checkpointed at most once (make sure you are not passing
         `find_unused_parameters=True` to DDP). We currently do not support the
         case where a layer is checkpointed multiple times, or when there unused
         parameters in the checkpointed model.
commit	6275f9165478de67b9e1548da48546078ea308c0	[log] [tgz]
author	Andrei Gheorghe <andrei@dharmaventures.co>	Mon Sep 25 22:54:20 2023 +0000
committer	PyTorch MergeBot <pytorchmergebot@users.noreply.github.com>	Mon Sep 25 22:54:24 2023 +0000
tree	65bc855c367b1e2426e9c5ba0ed0f1aea154eae4
parent	7ed06e8317d6f8ac5e339d1b354cd8632ea2613d [diff]