docs/source/autograd.rst - platform/external/pytorch - Git at Google

 .. role:: hidden
     :class: hidden-section

 Automatic differentiation package - torch.autograd
 ==================================================

 .. automodule:: torch.autograd
 .. currentmodule:: torch.autograd

 .. autosummary::
     :toctree: generated
     :nosignatures:

     backward
     grad

 .. _forward-mode-ad:

 Forward-mode Automatic Differentiation
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 .. warning::
     This API is in beta. Even though the function signatures are very unlikely to change, improved
     operator coverage is planned before we consider this stable.

 Please see the `forward-mode AD tutorial <https://pytorch.org/tutorials/intermediate/forward_ad_usage.html>`__
 for detailed steps on how to use this API.

 .. autosummary::
     :toctree: generated
     :nosignatures:

     forward_ad.dual_level
     forward_ad.make_dual
     forward_ad.unpack_dual

 .. _functional-api:

 Functional higher level API
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 .. warning::
     This API is in beta. Even though the function signatures are very unlikely to change, major
     improvements to performances are planned before we consider this stable.

 This section contains the higher level API for the autograd that builds on the basic API above
 and allows you to compute jacobians, hessians, etc.

 This API works with user-provided functions that take only Tensors as input and return
 only Tensors.
 If your function takes other arguments that are not Tensors or Tensors that don't have requires_grad set,
 you can use a lambda to capture them.
 For example, for a function ``f`` that takes three inputs, a Tensor for which we want the jacobian, another
 tensor that should be considered constant and a boolean flag as ``f(input, constant, flag=flag)``
 you can use it as ``functional.jacobian(lambda x: f(x, constant, flag=flag), input)``.

 .. autosummary::
     :toctree: generated
     :nosignatures:

     functional.jacobian
     functional.hessian
     functional.vjp
     functional.jvp
     functional.vhp
     functional.hvp

 .. _locally-disable-grad:

 Locally disabling gradient computation
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 See :ref:`locally-disable-grad-doc` for more information on the differences
 between no-grad and inference mode as well as other related mechanisms that
 may be confused with the two. Also see :ref:`torch-rst-local-disable-grad`
 for a list of functions that can be used to locally disable gradients.

 .. _default-grad-layouts:

 Default gradient layouts
 ^^^^^^^^^^^^^^^^^^^^^^^^

 When a non-sparse ``param`` receives a non-sparse gradient during
 :func:`torch.autograd.backward` or :func:`torch.Tensor.backward`
 ``param.grad`` is accumulated as follows.

 If ``param.grad`` is initially ``None``:

 1. If ``param``'s memory is non-overlapping and dense, ``.grad`` is
    created with strides matching ``param`` (thus matching ``param``'s
    layout).
 2. Otherwise, ``.grad`` is created with rowmajor-contiguous strides.

 If ``param`` already has a non-sparse ``.grad`` attribute:

 3. If ``create_graph=False``, ``backward()`` accumulates into ``.grad``
    in-place, which preserves its strides.
 4. If ``create_graph=True``, ``backward()`` replaces ``.grad`` with a
    new tensor ``.grad + new grad``, which attempts (but does not guarantee)
    matching the preexisting ``.grad``'s strides.

 The default behavior (letting ``.grad``\ s be ``None`` before the first
 ``backward()``, such that their layout is created according to 1 or 2,
 and retained over time according to 3 or 4) is recommended for best performance.
 Calls to ``model.zero_grad()`` or ``optimizer.zero_grad()`` will not affect ``.grad``
 layouts.

 In fact, resetting all ``.grad``\ s to ``None`` before each
 accumulation phase, e.g.::

     for iterations...
         ...
         for param in model.parameters():
             param.grad = None
         loss.backward()

 such that they're recreated according to 1 or 2 every time,
 is a valid alternative to ``model.zero_grad()`` or ``optimizer.zero_grad()``
 that may improve performance for some networks.

 Manual gradient layouts
 -----------------------

 If you need manual control over ``.grad``'s strides,
 assign ``param.grad =`` a zeroed tensor with desired strides
 before the first ``backward()``, and never reset it to ``None``.
 3 guarantees your layout is preserved as long as ``create_graph=False``.
 4 indicates your layout is *likely* preserved even if ``create_graph=True``.

 In-place operations on Tensors
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 Supporting in-place operations in autograd is a hard matter, and we discourage
 their use in most cases. Autograd's aggressive buffer freeing and reuse makes
 it very efficient and there are very few occasions when in-place operations
 actually lower memory usage by any significant amount. Unless you're operating
 under heavy memory pressure, you might never need to use them.

 In-place correctness checks
 ---------------------------

 All :class:`Tensor` s keep track of in-place operations applied to them, and
 if the implementation detects that a tensor was saved for backward in one of
 the functions, but it was modified in-place afterwards, an error will be raised
 once backward pass is started. This ensures that if you're using in-place
 functions and not seeing any errors, you can be sure that the computed
 gradients are correct.

 Variable (deprecated)
 ^^^^^^^^^^^^^^^^^^^^^

 .. warning::
     The Variable API has been deprecated: Variables are no longer necessary to
     use autograd with tensors. Autograd automatically supports Tensors with
     ``requires_grad`` set to ``True``. Below please find a quick guide on what
     has changed:

     - ``Variable(tensor)`` and ``Variable(tensor, requires_grad)`` still work as expected,
       but they return Tensors instead of Variables.
     - ``var.data`` is the same thing as ``tensor.data``.
     - Methods such as ``var.backward(), var.detach(), var.register_hook()`` now work on tensors
       with the same method names.

     In addition, one can now create tensors with ``requires_grad=True`` using factory
     methods such as :func:`torch.randn`, :func:`torch.zeros`, :func:`torch.ones`, and others
     like the following:

     ``autograd_tensor = torch.randn((2, 3, 4), requires_grad=True)``

 Tensor autograd functions
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 .. autosummary::
     :nosignatures:

    torch.Tensor.grad
    torch.Tensor.requires_grad
    torch.Tensor.is_leaf
    torch.Tensor.backward
    torch.Tensor.detach
    torch.Tensor.detach_
    torch.Tensor.register_hook
    torch.Tensor.retain_grad

 :hidden:`Function`
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 .. autoclass:: Function

 .. autosummary::
     :toctree: generated
     :nosignatures:

     Function.forward
     Function.backward
     Function.jvp
     Function.vmap

 Context method mixins
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 When creating a new :class:`Function`, the following methods are available to `ctx`.

 .. autosummary::
     :toctree: generated
     :nosignatures:

     function.FunctionCtx.mark_dirty
     function.FunctionCtx.mark_non_differentiable
     function.FunctionCtx.save_for_backward
     function.FunctionCtx.set_materialize_grads

 .. _grad-check:

 Numerical gradient checking
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^

 .. autosummary::
     :toctree: generated
     :nosignatures:

     gradcheck
     gradgradcheck

 Profiler
 ^^^^^^^^

 Autograd includes a profiler that lets you inspect the cost of different
 operators inside your model - both on the CPU and GPU. There are three modes
 implemented at the moment - CPU-only using :class:`~torch.autograd.profiler.profile`.
 nvprof based (registers both CPU and GPU activity) using
 :class:`~torch.autograd.profiler.emit_nvtx`.
 and vtune profiler based using
 :class:`~torch.autograd.profiler.emit_itt`.

 .. autoclass:: torch.autograd.profiler.profile

 .. autosummary::
     :toctree: generated
     :nosignatures:

     profiler.profile.export_chrome_trace
     profiler.profile.key_averages
     profiler.profile.self_cpu_time_total
     profiler.profile.total_average

 .. autoclass:: torch.autograd.profiler.emit_nvtx
 .. autoclass:: torch.autograd.profiler.emit_itt


 .. autosummary::
     :toctree: generated
     :nosignatures:

     profiler.load_nvprof

 Anomaly detection
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 .. autoclass:: detect_anomaly

 .. autoclass:: set_detect_anomaly


 Autograd graph
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 Autograd exposes methods that allow one to inspect the graph and interpose behavior during
 the backward pass.

 The ``grad_fn`` attribute of a :class:`torch.Tensor` holds a  :class:`torch.autograd.graph.Node`
 if the tensor is the output of a operation that was recorded by autograd (i.e., grad_mode is
 enabled and at least one of the inputs required gradients), or ``None`` otherwise.

 .. autosummary::
     :toctree: generated
     :nosignatures:

     graph.Node.name
     graph.Node.metadata
     graph.Node.next_functions
     graph.Node.register_hook
     graph.Node.register_prehook

 Some operations need intermediary results to be saved during the forward pass
 in order to execute the backward pass.
 These intermediary results are saved as attributes on the ``grad_fn`` and can be accessed.
 For example::

     >>> a = torch.tensor([0., 0., 0.], requires_grad=True)
     >>> b = a.exp()
     >>> print(isinstance(b.grad_fn, torch.autograd.graph.Node))
     True
     >>> print(dir(b.grad_fn))
     ['__call__', '__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '_raw_saved_result', '_register_hook_dict', '_saved_result', 'metadata', 'name', 'next_functions', 'register_hook', 'register_prehook', 'requires_grad']
     >>> print(torch.allclose(b.grad_fn._saved_result, b))
     True

 You can also define how these saved tensors should be packed / unpacked using hooks.
 A common application is to trade compute for memory by saving those intermediary results
 to disk or to CPU instead of leaving them on the GPU. This is especially useful if you
 notice your model fits on GPU during evaluation, but not training.
 Also see :ref:`saved-tensors-hooks-doc`.

 .. autoclass:: torch.autograd.graph.saved_tensors_hooks

 .. autoclass:: torch.autograd.graph.save_on_cpu

 .. autoclass:: torch.autograd.graph.disable_saved_tensors_hooks
	.. role:: hidden
	:class: hidden-section

	Automatic differentiation package - torch.autograd
	==================================================

	.. automodule:: torch.autograd
	.. currentmodule:: torch.autograd

	.. autosummary::
	:toctree: generated
	:nosignatures:

	backward
	grad

	.. _forward-mode-ad:

	Forward-mode Automatic Differentiation
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	.. warning::
	This API is in beta. Even though the function signatures are very unlikely to change, improved
	operator coverage is planned before we consider this stable.

	Please see the `forward-mode AD tutorial <https://pytorch.org/tutorials/intermediate/forward_ad_usage.html>`__
	for detailed steps on how to use this API.

	.. autosummary::
	:toctree: generated
	:nosignatures:

	forward_ad.dual_level
	forward_ad.make_dual
	forward_ad.unpack_dual

	.. _functional-api:

	Functional higher level API
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	.. warning::
	This API is in beta. Even though the function signatures are very unlikely to change, major
	improvements to performances are planned before we consider this stable.

	This section contains the higher level API for the autograd that builds on the basic API above
	and allows you to compute jacobians, hessians, etc.

	This API works with user-provided functions that take only Tensors as input and return
	only Tensors.
	If your function takes other arguments that are not Tensors or Tensors that don't have requires_grad set,
	you can use a lambda to capture them.
	For example, for a function ``f`` that takes three inputs, a Tensor for which we want the jacobian, another
	tensor that should be considered constant and a boolean flag as ``f(input, constant, flag=flag)``
	you can use it as ``functional.jacobian(lambda x: f(x, constant, flag=flag), input)``.

	.. autosummary::
	:toctree: generated
	:nosignatures:

	functional.jacobian
	functional.hessian
	functional.vjp
	functional.jvp
	functional.vhp
	functional.hvp

	.. _locally-disable-grad:

	Locally disabling gradient computation
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	See :ref:`locally-disable-grad-doc` for more information on the differences
	between no-grad and inference mode as well as other related mechanisms that
	may be confused with the two. Also see :ref:`torch-rst-local-disable-grad`
	for a list of functions that can be used to locally disable gradients.

	.. _default-grad-layouts:

	Default gradient layouts
	^^^^^^^^^^^^^^^^^^^^^^^^

	When a non-sparse ``param`` receives a non-sparse gradient during
	:func:`torch.autograd.backward` or :func:`torch.Tensor.backward`
	``param.grad`` is accumulated as follows.

	If ``param.grad`` is initially ``None``:

	1. If ``param``'s memory is non-overlapping and dense, ``.grad`` is
	created with strides matching ``param`` (thus matching ``param``'s
	layout).
	2. Otherwise, ``.grad`` is created with rowmajor-contiguous strides.

	If ``param`` already has a non-sparse ``.grad`` attribute:

	3. If ``create_graph=False``, ``backward()`` accumulates into ``.grad``
	in-place, which preserves its strides.
	4. If ``create_graph=True``, ``backward()`` replaces ``.grad`` with a
	new tensor ``.grad + new grad``, which attempts (but does not guarantee)
	matching the preexisting ``.grad``'s strides.

	The default behavior (letting ``.grad``\ s be ``None`` before the first
	``backward()``, such that their layout is created according to 1 or 2,
	and retained over time according to 3 or 4) is recommended for best performance.
	Calls to ``model.zero_grad()`` or ``optimizer.zero_grad()`` will not affect ``.grad``
	layouts.

	In fact, resetting all ``.grad``\ s to ``None`` before each
	accumulation phase, e.g.::

	for iterations...
	...
	for param in model.parameters():
	param.grad = None
	loss.backward()

	such that they're recreated according to 1 or 2 every time,
	is a valid alternative to ``model.zero_grad()`` or ``optimizer.zero_grad()``
	that may improve performance for some networks.

	Manual gradient layouts
	-----------------------

	If you need manual control over ``.grad``'s strides,
	assign ``param.grad =`` a zeroed tensor with desired strides
	before the first ``backward()``, and never reset it to ``None``.
	3 guarantees your layout is preserved as long as ``create_graph=False``.
	4 indicates your layout is likely preserved even if ``create_graph=True``.

	In-place operations on Tensors
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	Supporting in-place operations in autograd is a hard matter, and we discourage
	their use in most cases. Autograd's aggressive buffer freeing and reuse makes
	it very efficient and there are very few occasions when in-place operations
	actually lower memory usage by any significant amount. Unless you're operating
	under heavy memory pressure, you might never need to use them.

	In-place correctness checks
	---------------------------

	All :class:`Tensor` s keep track of in-place operations applied to them, and
	if the implementation detects that a tensor was saved for backward in one of
	the functions, but it was modified in-place afterwards, an error will be raised
	once backward pass is started. This ensures that if you're using in-place
	functions and not seeing any errors, you can be sure that the computed
	gradients are correct.

	Variable (deprecated)
	^^^^^^^^^^^^^^^^^^^^^

	.. warning::
	The Variable API has been deprecated: Variables are no longer necessary to
	use autograd with tensors. Autograd automatically supports Tensors with
	``requires_grad`` set to ``True``. Below please find a quick guide on what
	has changed:

	- ``Variable(tensor)`` and ``Variable(tensor, requires_grad)`` still work as expected,
	but they return Tensors instead of Variables.
	- ``var.data`` is the same thing as ``tensor.data``.
	- Methods such as ``var.backward(), var.detach(), var.register_hook()`` now work on tensors
	with the same method names.

	In addition, one can now create tensors with ``requires_grad=True`` using factory
	methods such as :func:`torch.randn`, :func:`torch.zeros`, :func:`torch.ones`, and others
	like the following:

	``autograd_tensor = torch.randn((2, 3, 4), requires_grad=True)``

	Tensor autograd functions
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	.. autosummary::
	:nosignatures:

	torch.Tensor.grad
	torch.Tensor.requires_grad
	torch.Tensor.is_leaf
	torch.Tensor.backward
	torch.Tensor.detach
	torch.Tensor.detach_
	torch.Tensor.register_hook
	torch.Tensor.retain_grad

	:hidden:`Function`
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	.. autoclass:: Function

	.. autosummary::
	:toctree: generated
	:nosignatures:

	Function.forward
	Function.backward
	Function.jvp
	Function.vmap

	Context method mixins
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	When creating a new :class:`Function`, the following methods are available to `ctx`.

	.. autosummary::
	:toctree: generated
	:nosignatures:

	function.FunctionCtx.mark_dirty
	function.FunctionCtx.mark_non_differentiable
	function.FunctionCtx.save_for_backward
	function.FunctionCtx.set_materialize_grads

	.. _grad-check:

	Numerical gradient checking
	^^^^^^^^^^^^^^^^^^^^^^^^^^^

	.. autosummary::
	:toctree: generated
	:nosignatures:

	gradcheck
	gradgradcheck

	Profiler
	^^^^^^^^

	Autograd includes a profiler that lets you inspect the cost of different
	operators inside your model - both on the CPU and GPU. There are three modes
	implemented at the moment - CPU-only using :class:`~torch.autograd.profiler.profile`.
	nvprof based (registers both CPU and GPU activity) using
	:class:`~torch.autograd.profiler.emit_nvtx`.
	and vtune profiler based using
	:class:`~torch.autograd.profiler.emit_itt`.

	.. autoclass:: torch.autograd.profiler.profile

	.. autosummary::
	:toctree: generated
	:nosignatures:

	profiler.profile.export_chrome_trace
	profiler.profile.key_averages
	profiler.profile.self_cpu_time_total
	profiler.profile.total_average

	.. autoclass:: torch.autograd.profiler.emit_nvtx
	.. autoclass:: torch.autograd.profiler.emit_itt


	.. autosummary::
	:toctree: generated
	:nosignatures:

	profiler.load_nvprof

	Anomaly detection
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	.. autoclass:: detect_anomaly

	.. autoclass:: set_detect_anomaly


	Autograd graph
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	Autograd exposes methods that allow one to inspect the graph and interpose behavior during
	the backward pass.

	The ``grad_fn`` attribute of a :class:`torch.Tensor` holds a :class:`torch.autograd.graph.Node`
	if the tensor is the output of a operation that was recorded by autograd (i.e., grad_mode is
	enabled and at least one of the inputs required gradients), or ``None`` otherwise.

	.. autosummary::
	:toctree: generated
	:nosignatures:

	graph.Node.name
	graph.Node.metadata
	graph.Node.next_functions
	graph.Node.register_hook
	graph.Node.register_prehook

	Some operations need intermediary results to be saved during the forward pass
	in order to execute the backward pass.
	These intermediary results are saved as attributes on the ``grad_fn`` and can be accessed.
	For example::

	>>> a = torch.tensor([0., 0., 0.], requires_grad=True)
	>>> b = a.exp()
	>>> print(isinstance(b.grad_fn, torch.autograd.graph.Node))
	True
	>>> print(dir(b.grad_fn))
	['__call__', '__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '_raw_saved_result', '_register_hook_dict', '_saved_result', 'metadata', 'name', 'next_functions', 'register_hook', 'register_prehook', 'requires_grad']
	>>> print(torch.allclose(b.grad_fn._saved_result, b))
	True

	You can also define how these saved tensors should be packed / unpacked using hooks.
	A common application is to trade compute for memory by saving those intermediary results
	to disk or to CPU instead of leaving them on the GPU. This is especially useful if you
	notice your model fits on GPU during evaluation, but not training.
	Also see :ref:`saved-tensors-hooks-doc`.

	.. autoclass:: torch.autograd.graph.saved_tensors_hooks

	.. autoclass:: torch.autograd.graph.save_on_cpu

	.. autoclass:: torch.autograd.graph.disable_saved_tensors_hooks