| .. role:: hidden | 
 |     :class: hidden-section | 
 |  | 
 | Automatic differentiation package - torch.autograd | 
 | ================================================== | 
 |  | 
 | .. automodule:: torch.autograd | 
 | .. currentmodule:: torch.autograd | 
 |  | 
 | .. autosummary:: | 
 |     :toctree: generated | 
 |     :nosignatures: | 
 |  | 
 |     backward | 
 |     grad | 
 |  | 
 | .. _functional-api: | 
 |  | 
 | Functional higher level API | 
 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
 |  | 
 | .. warning:: | 
 |     This API is in beta. Even though the function signatures are very unlikely to change, major | 
 |     improvements to performances are planned before we consider this stable. | 
 |  | 
 | This section contains the higher level API for the autograd that builds on the basic API above | 
 | and allows you to compute jacobians, hessians, etc. | 
 |  | 
 | This API works with user-provided functions that take only Tensors as input and return | 
 | only Tensors. | 
 | If your function takes other arguments that are not Tensors or Tensors that don't have requires_grad set, | 
 | you can use a lambda to capture them. | 
 | For example, for a function ``f`` that takes three inputs, a Tensor for which we want the jacobian, another | 
 | tensor that should be considered constant and a boolean flag as ``f(input, constant, flag=flag)`` | 
 | you can use it as ``functional.jacobian(lambda x: f(x, constant, flag=flag), input)``. | 
 |  | 
 | .. autosummary:: | 
 |     :toctree: generated | 
 |     :nosignatures: | 
 |  | 
 |     functional.jacobian | 
 |     functional.hessian | 
 |     functional.vjp | 
 |     functional.jvp | 
 |     functional.vhp | 
 |     functional.hvp | 
 |  | 
 | .. _locally-disable-grad: | 
 |  | 
 | Locally disabling gradient computation | 
 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
 |  | 
 | See :ref:`locally-disable-grad-doc` for more information on the differences | 
 | between no-grad and inference mode as well as other related mechanisms that | 
 | may be confused with the two. | 
 |  | 
 | .. autosummary:: | 
 |     :toctree: generated | 
 |     :nosignatures: | 
 |  | 
 |     no_grad | 
 |     enable_grad | 
 |     set_grad_enabled | 
 |     inference_mode | 
 |  | 
 | .. _default-grad-layouts: | 
 |  | 
 | Default gradient layouts | 
 | ^^^^^^^^^^^^^^^^^^^^^^^^ | 
 |  | 
 | When a non-sparse ``param`` receives a non-sparse gradient during | 
 | :func:`torch.autograd.backward` or :func:`torch.Tensor.backward` | 
 | ``param.grad`` is accumulated as follows. | 
 |  | 
 | If ``param.grad`` is initially ``None``: | 
 |  | 
 | 1. If ``param``'s memory is non-overlapping and dense, ``.grad`` is | 
 |    created with strides matching ``param`` (thus matching ``param``'s | 
 |    layout). | 
 | 2. Otherwise, ``.grad`` is created with rowmajor-contiguous strides. | 
 |  | 
 | If ``param`` already has a non-sparse ``.grad`` attribute: | 
 |  | 
 | 3. If ``create_graph=False``, ``backward()`` accumulates into ``.grad`` | 
 |    in-place, which preserves its strides. | 
 | 4. If ``create_graph=True``, ``backward()`` replaces ``.grad`` with a | 
 |    new tensor ``.grad + new grad``, which attempts (but does not guarantee) | 
 |    matching the preexisting ``.grad``'s strides. | 
 |  | 
 | The default behavior (letting ``.grad``\ s be ``None`` before the first | 
 | ``backward()``, such that their layout is created according to 1 or 2, | 
 | and retained over time according to 3 or 4) is recommended for best performance. | 
 | Calls to ``model.zero_grad()`` or ``optimizer.zero_grad()`` will not affect ``.grad`` | 
 | layouts. | 
 |  | 
 | In fact, resetting all ``.grad``\ s to ``None`` before each | 
 | accumulation phase, e.g.:: | 
 |  | 
 |     for iterations... | 
 |         ... | 
 |         for param in model.parameters(): | 
 |             param.grad = None | 
 |         loss.backward() | 
 |  | 
 | such that they're recreated according to 1 or 2 every time, | 
 | is a valid alternative to ``model.zero_grad()`` or ``optimizer.zero_grad()`` | 
 | that may improve performance for some networks. | 
 |  | 
 | Manual gradient layouts | 
 | ----------------------- | 
 |  | 
 | If you need manual control over ``.grad``'s strides, | 
 | assign ``param.grad =`` a zeroed tensor with desired strides | 
 | before the first ``backward()``, and never reset it to ``None``. | 
 | 3 guarantees your layout is preserved as long as ``create_graph=False``. | 
 | 4 indicates your layout is *likely* preserved even if ``create_graph=True``. | 
 |  | 
 | In-place operations on Tensors | 
 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
 |  | 
 | Supporting in-place operations in autograd is a hard matter, and we discourage | 
 | their use in most cases. Autograd's aggressive buffer freeing and reuse makes | 
 | it very efficient and there are very few occasions when in-place operations | 
 | actually lower memory usage by any significant amount. Unless you're operating | 
 | under heavy memory pressure, you might never need to use them. | 
 |  | 
 | In-place correctness checks | 
 | --------------------------- | 
 |  | 
 | All :class:`Tensor` s keep track of in-place operations applied to them, and | 
 | if the implementation detects that a tensor was saved for backward in one of | 
 | the functions, but it was modified in-place afterwards, an error will be raised | 
 | once backward pass is started. This ensures that if you're using in-place | 
 | functions and not seeing any errors, you can be sure that the computed | 
 | gradients are correct. | 
 |  | 
 | Variable (deprecated) | 
 | ^^^^^^^^^^^^^^^^^^^^^ | 
 |  | 
 | .. warning:: | 
 |     The Variable API has been deprecated: Variables are no longer necessary to | 
 |     use autograd with tensors. Autograd automatically supports Tensors with | 
 |     ``requires_grad`` set to ``True``. Below please find a quick guide on what | 
 |     has changed: | 
 |  | 
 |     - ``Variable(tensor)`` and ``Variable(tensor, requires_grad)`` still work as expected, | 
 |       but they return Tensors instead of Variables. | 
 |     - ``var.data`` is the same thing as ``tensor.data``. | 
 |     - Methods such as ``var.backward(), var.detach(), var.register_hook()`` now work on tensors | 
 |       with the same method names. | 
 |  | 
 |     In addition, one can now create tensors with ``requires_grad=True`` using factory | 
 |     methods such as :func:`torch.randn`, :func:`torch.zeros`, :func:`torch.ones`, and others | 
 |     like the following: | 
 |  | 
 |     ``autograd_tensor = torch.randn((2, 3, 4), requires_grad=True)`` | 
 |  | 
 | Tensor autograd functions | 
 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
 | .. autosummary:: | 
 |     :nosignatures: | 
 |  | 
 |    torch.Tensor.grad | 
 |    torch.Tensor.requires_grad | 
 |    torch.Tensor.is_leaf | 
 |    torch.Tensor.backward | 
 |    torch.Tensor.detach | 
 |    torch.Tensor.detach_ | 
 |    torch.Tensor.register_hook | 
 |    torch.Tensor.retain_grad | 
 |  | 
 | :hidden:`Function` | 
 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
 |  | 
 | .. autoclass:: Function | 
 |  | 
 | .. autosummary:: | 
 |     :toctree: generated | 
 |     :nosignatures: | 
 |  | 
 |     Function.backward | 
 |     Function.forward | 
 |  | 
 | Context method mixins | 
 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
 | When creating a new :class:`Function`, the following methods are available to `ctx`. | 
 |  | 
 | .. autosummary:: | 
 |     :toctree: generated | 
 |     :nosignatures: | 
 |  | 
 |     function.FunctionCtx.mark_dirty | 
 |     function.FunctionCtx.mark_non_differentiable | 
 |     function.FunctionCtx.save_for_backward | 
 |     function.FunctionCtx.set_materialize_grads | 
 |  | 
 | .. _grad-check: | 
 |  | 
 | Numerical gradient checking | 
 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
 |  | 
 | .. autosummary:: | 
 |     :toctree: generated | 
 |     :nosignatures: | 
 |  | 
 |     gradcheck | 
 |     gradgradcheck | 
 |  | 
 | Profiler | 
 | ^^^^^^^^ | 
 |  | 
 | Autograd includes a profiler that lets you inspect the cost of different | 
 | operators inside your model - both on the CPU and GPU. There are two modes | 
 | implemented at the moment - CPU-only using :class:`~torch.autograd.profiler.profile`. | 
 | and nvprof based (registers both CPU and GPU activity) using | 
 | :class:`~torch.autograd.profiler.emit_nvtx`. | 
 |  | 
 | .. autoclass:: torch.autograd.profiler.profile | 
 |  | 
 | .. autosummary:: | 
 |     :toctree: generated | 
 |     :nosignatures: | 
 |  | 
 |     profiler.profile.export_chrome_trace | 
 |     profiler.profile.key_averages | 
 |     profiler.profile.self_cpu_time_total | 
 |     profiler.profile.total_average | 
 |  | 
 | .. autoclass:: torch.autograd.profiler.emit_nvtx | 
 |  | 
 |  | 
 | .. autosummary:: | 
 |     :toctree: generated | 
 |     :nosignatures: | 
 |  | 
 |     profiler.load_nvprof | 
 |  | 
 | Anomaly detection | 
 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
 |  | 
 | .. autoclass:: detect_anomaly | 
 |  | 
 | .. autoclass:: set_detect_anomaly | 
 |  | 
 |  | 
 | Saved tensors default hooks | 
 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
 |  | 
 | Some operations need intermediary results to be saved during the forward pass | 
 | in order to execute the backward pass. | 
 | You can define how these saved tensors should be packed / unpacked using hooks. | 
 | A common application is to trade compute for memory by saving those intermediary results | 
 | to disk or to CPU instead of leaving them on the GPU. This is especially useful if you | 
 | notice your model fits on GPU during evaluation, but not training. | 
 | Also see :ref:`saved-tensors-hooks-doc`. | 
 |  | 
 | .. autoclass:: torch.autograd.graph.saved_tensors_hooks | 
 |  | 
 | .. autoclass:: torch.autograd.graph.save_on_cpu |