|  | torch.optim | 
|  | =================================== | 
|  |  | 
|  | .. automodule:: torch.optim | 
|  |  | 
|  | How to use an optimizer | 
|  | ----------------------- | 
|  |  | 
|  | To use :mod:`torch.optim` you have to construct an optimizer object, that will hold | 
|  | the current state and will update the parameters based on the computed gradients. | 
|  |  | 
|  | Constructing it | 
|  | ^^^^^^^^^^^^^^^ | 
|  |  | 
|  | To construct an :class:`Optimizer` you have to give it an iterable containing the | 
|  | parameters (all should be :class:`~torch.autograd.Variable` s) to optimize. Then, | 
|  | you can specify optimizer-specific options such as the learning rate, weight decay, etc. | 
|  |  | 
|  | .. note:: | 
|  |  | 
|  | If you need to move a model to GPU via ``.cuda()``, please do so before | 
|  | constructing optimizers for it. Parameters of a model after ``.cuda()`` will | 
|  | be different objects with those before the call. | 
|  |  | 
|  | In general, you should make sure that optimized parameters live in | 
|  | consistent locations when optimizers are constructed and used. | 
|  |  | 
|  | Example:: | 
|  |  | 
|  | optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9) | 
|  | optimizer = optim.Adam([var1, var2], lr=0.0001) | 
|  |  | 
|  | Per-parameter options | 
|  | ^^^^^^^^^^^^^^^^^^^^^ | 
|  |  | 
|  | :class:`Optimizer` s also support specifying per-parameter options. To do this, instead | 
|  | of passing an iterable of :class:`~torch.autograd.Variable` s, pass in an iterable of | 
|  | :class:`dict` s. Each of them will define a separate parameter group, and should contain | 
|  | a ``params`` key, containing a list of parameters belonging to it. Other keys | 
|  | should match the keyword arguments accepted by the optimizers, and will be used | 
|  | as optimization options for this group. | 
|  |  | 
|  | .. note:: | 
|  |  | 
|  | You can still pass options as keyword arguments. They will be used as | 
|  | defaults, in the groups that didn't override them. This is useful when you | 
|  | only want to vary a single option, while keeping all others consistent | 
|  | between parameter groups. | 
|  |  | 
|  |  | 
|  | For example, this is very useful when one wants to specify per-layer learning rates:: | 
|  |  | 
|  | optim.SGD([ | 
|  | {'params': model.base.parameters()}, | 
|  | {'params': model.classifier.parameters(), 'lr': 1e-3} | 
|  | ], lr=1e-2, momentum=0.9) | 
|  |  | 
|  | This means that ``model.base``'s parameters will use the default learning rate of ``1e-2``, | 
|  | ``model.classifier``'s parameters will use a learning rate of ``1e-3``, and a momentum of | 
|  | ``0.9`` will be used for all parameters. | 
|  |  | 
|  | Taking an optimization step | 
|  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
|  |  | 
|  | All optimizers implement a :func:`~Optimizer.step` method, that updates the | 
|  | parameters. It can be used in two ways: | 
|  |  | 
|  | ``optimizer.step()`` | 
|  | ~~~~~~~~~~~~~~~~~~~~ | 
|  |  | 
|  | This is a simplified version supported by most optimizers. The function can be | 
|  | called once the gradients are computed using e.g. | 
|  | :func:`~torch.autograd.Variable.backward`. | 
|  |  | 
|  | Example:: | 
|  |  | 
|  | for input, target in dataset: | 
|  | optimizer.zero_grad() | 
|  | output = model(input) | 
|  | loss = loss_fn(output, target) | 
|  | loss.backward() | 
|  | optimizer.step() | 
|  |  | 
|  | ``optimizer.step(closure)`` | 
|  | ~~~~~~~~~~~~~~~~~~~~~~~~~~~ | 
|  |  | 
|  | Some optimization algorithms such as Conjugate Gradient and LBFGS need to | 
|  | reevaluate the function multiple times, so you have to pass in a closure that | 
|  | allows them to recompute your model. The closure should clear the gradients, | 
|  | compute the loss, and return it. | 
|  |  | 
|  | Example:: | 
|  |  | 
|  | for input, target in dataset: | 
|  | def closure(): | 
|  | optimizer.zero_grad() | 
|  | output = model(input) | 
|  | loss = loss_fn(output, target) | 
|  | loss.backward() | 
|  | return loss | 
|  | optimizer.step(closure) | 
|  |  | 
|  | .. _optimizer-algorithms: | 
|  |  | 
|  | Algorithms | 
|  | ---------- | 
|  |  | 
|  | .. autoclass:: Optimizer | 
|  | :members: | 
|  | .. autoclass:: Adadelta | 
|  | :members: | 
|  | .. autoclass:: Adagrad | 
|  | :members: | 
|  | .. autoclass:: Adam | 
|  | :members: | 
|  | .. autoclass:: AdamW | 
|  | :members: | 
|  | .. autoclass:: SparseAdam | 
|  | :members: | 
|  | .. autoclass:: Adamax | 
|  | :members: | 
|  | .. autoclass:: ASGD | 
|  | :members: | 
|  | .. autoclass:: LBFGS | 
|  | :members: | 
|  | .. autoclass:: RMSprop | 
|  | :members: | 
|  | .. autoclass:: Rprop | 
|  | :members: | 
|  | .. autoclass:: SGD | 
|  | :members: | 
|  |  | 
|  | How to adjust learning rate | 
|  | --------------------------- | 
|  |  | 
|  | :mod:`torch.optim.lr_scheduler` provides several methods to adjust the learning | 
|  | rate based on the number of epochs. :class:`torch.optim.lr_scheduler.ReduceLROnPlateau` | 
|  | allows dynamic learning rate reducing based on some validation measurements. | 
|  |  | 
|  | Learning rate scheduling should be applied after optimizer's update; e.g., you | 
|  | should write your code this way: | 
|  |  | 
|  | >>> scheduler = ... | 
|  | >>> for epoch in range(100): | 
|  | >>>     train(...) | 
|  | >>>     validate(...) | 
|  | >>>     scheduler.step() | 
|  |  | 
|  | .. warning:: | 
|  | Prior to PyTorch 1.1.0, the learning rate scheduler was expected to be called before | 
|  | the optimizer's update; 1.1.0 changed this behavior in a BC-breaking way.  If you use | 
|  | the learning rate scheduler (calling ``scheduler.step()``) before the optimizer's update | 
|  | (calling ``optimizer.step()``), this will skip the first value of the learning rate schedule. | 
|  | If you are unable to reproduce results after upgrading to PyTorch 1.1.0, please check | 
|  | if you are calling ``scheduler.step()`` at the wrong time. | 
|  |  | 
|  |  | 
|  | .. autoclass:: torch.optim.lr_scheduler.LambdaLR | 
|  | :members: | 
|  | .. autoclass:: torch.optim.lr_scheduler.MultiplicativeLR | 
|  | :members: | 
|  | .. autoclass:: torch.optim.lr_scheduler.StepLR | 
|  | :members: | 
|  | .. autoclass:: torch.optim.lr_scheduler.MultiStepLR | 
|  | :members: | 
|  | .. autoclass:: torch.optim.lr_scheduler.ExponentialLR | 
|  | :members: | 
|  | .. autoclass:: torch.optim.lr_scheduler.CosineAnnealingLR | 
|  | :members: | 
|  | .. autoclass:: torch.optim.lr_scheduler.ReduceLROnPlateau | 
|  | :members: | 
|  | .. autoclass:: torch.optim.lr_scheduler.CyclicLR | 
|  | :members: | 
|  | .. autoclass:: torch.optim.lr_scheduler.OneCycleLR | 
|  | :members: | 
|  | .. autoclass:: torch.optim.lr_scheduler.CosineAnnealingWarmRestarts | 
|  | :members: |