torch/optim/__init__.py - platform/external/pytorch - Git at Google

 """
 :mod:`torch.optim` is a package for optimizing neural networks.
 It provides a wide variety of optimization methods such as SGD, Adam etc.

 Currently, the following optimization methods are supported, typically with
 options such as weight decay and other bells and whistles.

 - SGD
 - AdaDelta
 - Adagrad
 - Adam
 - AdaMax
 - Averaged SGD
 - RProp
 - RMSProp


 The usage of the Optim package itself is as follows.

 1. Construct an optimizer
 2. Use ``optimizer.step(...)`` to optimize.
    - Call ``optimizer.zero_grad()`` to zero out the gradient buffers when appropriate

 Constructing the optimizer
 --------------------------

 One first constructs an ``Optimizer`` object by giving it a list of parameters
 to optimize, as well as the optimizer options,such as learning rate, weight decay, etc.

 Examples::

     optimizer = optim.SGD(model.parameters(), lr = 0.01, momentum=0.9)
     optimizer = optim.Adam([var1, var2], lr = 0.0001)

 Per-parameter options
 ---------------------

 In a more advanced usage, one can specify per-layer options by passing each parameter group along with it's custom options.

 **Any parameter group that does not have an attribute defined will use the default attributes.**

 This is very useful when one wants to specify per-layer learning rates for example.

 For example such invocation::

     optim.SGD([
         {'params': model1.parameters()},
         {'params': model2.parameters(), 'lr': 1e-3}],
         lr=1e-2, momentum=0.9)

 means that

 * ``model1``'s parameters will use the default learning rate of ``1e-2`` and momentum of ``0.9``
 * ``model2``'s parameters will use a learning rate of ``1e-3``, and the default momentum of ``0.9``

 Then, you can use the optimizer by calling `optimizer.zero_grad()` and `optimizer.step(...)`. Read the next sections.

 Taking an optimization step using ``step``
 -------------------------------------------------------

 ``optimizer.step()``
 ^^^^^^^^^^^^^^^^^^^^

 This is a simplified version supported by most optimizers.

 The function can be called after computing the gradients with ``backward()``.

 Example 2 - training a neural network::

     net = MNISTNet()
     criterion = ClassNLLLoss()
     optimizer = optim.SGD(net.parameters(), lr=0.001)

     for data in data_batches:
         input, target = data
             optimizer.zero_grad()
             output = net(input)
             loss = criterion(output, target)
             loss.backward()
             optimizer.step()

 The step function can be used in two ways.

 ``optimizer.step(closure)``
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^

 The ``step`` function takes a user-defined closure that computes f(x) and returns the loss.

 The closure should look somewhat like this::

     def f_closure(x):
         optimizer.zero_grad()
         loss = f(x)
         loss.backward()
         return loss

 Example 1 - training a neural network::

     net = MNISTNet()
     criterion = ClassNLLLoss()
     optimizer = optim.SGD(net.parameters(), lr=0.001)

     for data in data_batches:
         input, target = data
             def closure():
                 optimizer.zero_grad()
                 output = net(input)
                     loss = criterion(output, target)
                     loss.backward()
                     return loss
             optimizer.step(closure)

 Note:
     **Why is this supported?**
     Some optimization algorithms such as Conjugate Gradient and LBFGS need to evaluate their function
     multiple times. For such optimization methods, the function (i.e. the closure) has to be defined.
 """

 from .adadelta import Adadelta
 from .adagrad import Adagrad
 from .adam import Adam
 from .adamax import Adamax
 from .asgd import ASGD
 from .sgd import SGD
 from .rprop import Rprop
 from .rmsprop import RMSprop
 from .optimizer import Optimizer

 del adadelta
 del adagrad
 del adam
 del adamax
 del asgd
 del sgd
 del rprop
 del rmsprop
 del optimizer
	"""
	:mod:`torch.optim` is a package for optimizing neural networks.
	It provides a wide variety of optimization methods such as SGD, Adam etc.

	Currently, the following optimization methods are supported, typically with
	options such as weight decay and other bells and whistles.

	- SGD
	- AdaDelta
	- Adagrad
	- Adam
	- AdaMax
	- Averaged SGD
	- RProp
	- RMSProp


	The usage of the Optim package itself is as follows.

	1. Construct an optimizer
	2. Use ``optimizer.step(...)`` to optimize.
	- Call ``optimizer.zero_grad()`` to zero out the gradient buffers when appropriate

	Constructing the optimizer
	--------------------------

	One first constructs an ``Optimizer`` object by giving it a list of parameters
	to optimize, as well as the optimizer options,such as learning rate, weight decay, etc.

	Examples::

	optimizer = optim.SGD(model.parameters(), lr = 0.01, momentum=0.9)
	optimizer = optim.Adam([var1, var2], lr = 0.0001)

	Per-parameter options
	---------------------

	In a more advanced usage, one can specify per-layer options by passing each parameter group along with it's custom options.

	Any parameter group that does not have an attribute defined will use the default attributes.

	This is very useful when one wants to specify per-layer learning rates for example.

	For example such invocation::

	optim.SGD([
	{'params': model1.parameters()},
	{'params': model2.parameters(), 'lr': 1e-3}],
	lr=1e-2, momentum=0.9)

	means that

	* ``model1``'s parameters will use the default learning rate of ``1e-2`` and momentum of ``0.9``
	* ``model2``'s parameters will use a learning rate of ``1e-3``, and the default momentum of ``0.9``

	Then, you can use the optimizer by calling `optimizer.zero_grad()` and `optimizer.step(...)`. Read the next sections.

	Taking an optimization step using ``step``
	-------------------------------------------------------

	``optimizer.step()``
	^^^^^^^^^^^^^^^^^^^^

	This is a simplified version supported by most optimizers.

	The function can be called after computing the gradients with ``backward()``.

	Example 2 - training a neural network::

	net = MNISTNet()
	criterion = ClassNLLLoss()
	optimizer = optim.SGD(net.parameters(), lr=0.001)

	for data in data_batches:
	input, target = data
	optimizer.zero_grad()
	output = net(input)
	loss = criterion(output, target)
	loss.backward()
	optimizer.step()

	The step function can be used in two ways.

	``optimizer.step(closure)``
	^^^^^^^^^^^^^^^^^^^^^^^^^^^

	The ``step`` function takes a user-defined closure that computes f(x) and returns the loss.

	The closure should look somewhat like this::

	def f_closure(x):
	optimizer.zero_grad()
	loss = f(x)
	loss.backward()
	return loss

	Example 1 - training a neural network::

	net = MNISTNet()
	criterion = ClassNLLLoss()
	optimizer = optim.SGD(net.parameters(), lr=0.001)

	for data in data_batches:
	input, target = data
	def closure():
	optimizer.zero_grad()
	output = net(input)
	loss = criterion(output, target)
	loss.backward()
	return loss
	optimizer.step(closure)

	Note:
	Why is this supported?
	Some optimization algorithms such as Conjugate Gradient and LBFGS need to evaluate their function
	multiple times. For such optimization methods, the function (i.e. the closure) has to be defined.
	"""

	from .adadelta import Adadelta
	from .adagrad import Adagrad
	from .adam import Adam
	from .adamax import Adamax
	from .asgd import ASGD
	from .sgd import SGD
	from .rprop import Rprop
	from .rmsprop import RMSprop
	from .optimizer import Optimizer

	del adadelta
	del adagrad
	del adam
	del adamax
	del asgd
	del sgd
	del rprop
	del rmsprop
	del optimizer