torch.nn

AvgPool2d

Applies a 2D average pooling over an input signal composed of several input

The output value of the layer with input (b x C x H x W) and output (b x C x oH x oW)
can be precisely described as:
output[b_i][c_i][h_i][w_i] = (1 / K) * sum_{kh=1, KH} sum_{kw=1, kW}  input[b_i][c_i][stride_h * h_i + kh)][stride_w * w_i + kw)]
# pool of square window of size=3, stride=2
m = nn.AvgPool2d(3, stride=2)
# pool of non-square window
m = nn.AvgPool2d((3, 2), stride=(2, 1))
input = autograd.Variable(torch.randn(20, 16, 50, 32))
output = m(input)

planes.

Constructor Arguments

ParameterDefaultDescription
kernel_sizethe size of the window. Can be a single number k (for a square kernel of k x k) or a tuple (kh x kw)
stridekernel_sizethe stride of the window. Can be a single number s or a tuple (sh x sw).
padding0implicit padding to be added. Can be a single number or a tuple.
ceil_modewhen True, will use “ceil” instead of “floor” to compute the output shape

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | [ * , * , , * ] | Input is minibatch x channels x iH x iW output | [ * , * , , * ] | Output shape = minibatch x channels x floor((iH + 2padH - kH) / sH + 1) x floor((iW + 2padW - kW) / sW + 1)

AvgPool3d

Applies a 3D average pooling over an input signal composed of several input

# pool of square window of size=3, stride=2
m = nn.AvgPool3d(3, stride=2)
# pool of non-square window
m = nn.AvgPool3d((3, 2, 2), stride=(2, 1, 2))
input = autograd.Variable(torch.randn(20, 16, 50,44, 31))
output = m(input)

planes.

Constructor Arguments

ParameterDefaultDescription
kernel_sizethe size of the window to take a average over. Can be a single number k (for a square kernel of k x k x k) or a tuple (kt x kh x kw)
stridekernel_sizethe stride of the window. Can be a single number s or a tuple (st x sh x sw).

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | [ * , * , *, , * ] | Input is minibatch x channels x iT x iH x iW output | [ * , * , , , * ] | Output shape = minibatch x channels x floor((iT + 2padT - kT) / sT + 1) x floor((iH + 2padH - kH) / sH + 1) x floor((iW + 2padW - kW) / sW + 1)

BatchNorm1d

Applies Batch Normalization over a 2d input that is seen as a mini-batch of 1d inputs

              x - mean(x)
y =  ----------------------------- * gamma + beta
      standard_deviation(x) + eps
# With Learnable Parameters
m = nn.BatchNorm1d(100)
# Without Learnable Parameters
m = nn.BatchNorm1d(100, affine=False)
input = autograd.Variable(torch.randn(20, 100))
output = m(input)

The mean and standard-deviation are calculated per-dimension over the mini-batches and gamma and beta are learnable parameter vectors of size N (where N is the input size).

During training, this layer keeps a running estimate of its computed mean and variance. The running sum is kept with a default momentum of 0.1 During evaluation, this running mean/variance is used for normalization.

Constructor Arguments

ParameterDefaultDescription
num_featuresthe size of each 1D input in the mini-batch
eps1e-5a value added to the denominator for numerical stability.
momentum0.1the value used for the running_mean and running_var computation.
affinea boolean value that when set to true, gives the layer learnable affine parameters.

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | [ * , num_features ] | 2D Tensor of nBatches x num_features output | Same | Output has the same shape as input

Returns

a normalized tensor in the batch dimension

BatchNorm2d

Applies Batch Normalization over a 4d input that is seen as a mini-batch of 3d inputs

              x - mean(x)
y =  ----------------------------- * gamma + beta
      standard_deviation(x) + eps
# With Learnable Parameters
m = nn.BatchNorm2d(100)
# Without Learnable Parameters
m = nn.BatchNorm2d(100, affine=False)
input = autograd.Variable(torch.randn(20, 100, 35, 45))
output = m(input)

The mean and standard-deviation are calculated per-dimension over the mini-batches and gamma and beta are learnable parameter vectors of size N (where N is the input size).

During training, this layer keeps a running estimate of its computed mean and variance. The running sum is kept with a default momentum of 0.1 During evaluation, this running mean/variance is used for normalization.

Constructor Arguments

ParameterDefaultDescription
num_featuresnum_features from an expected input of size batch_size x num_features x height x width
eps1e-5a value added to the denominator for numerical stability.
momentum0.1the value used for the running_mean and running_var computation.
affinea boolean value that when set to true, gives the layer learnable affine parameters.

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | [ * , num_features , *, * ] | 4D Tensor of batch_size x num_features x height x width output | Same | Output has the same shape as input

Returns

a normalized tensor in the batch dimension

BatchNorm3d

Applies Batch Normalization over a 5d input that is seen as a mini-batch of 4d inputs

              x - mean(x)
y =  ----------------------------- * gamma + beta
      standard_deviation(x) + eps
# With Learnable Parameters
m = nn.BatchNorm3d(100)
# Without Learnable Parameters
m = nn.BatchNorm3d(100, affine=False)
input = autograd.Variable(torch.randn(20, 100, 35, 45, 10))
output = m(input)

The mean and standard-deviation are calculated per-dimension over the mini-batches and gamma and beta are learnable parameter vectors of size N (where N is the input size).

During training, this layer keeps a running estimate of its computed mean and variance. The running sum is kept with a default momentum of 0.1 During evaluation, this running mean/variance is used for normalization.

Constructor Arguments

ParameterDefaultDescription
num_featuresnum_features from an expected input of size batch_size x num_features x height x width
eps1e-5a value added to the denominator for numerical stability.
momentum0.1the value used for the running_mean and running_var computation.
affinea boolean value that when set to true, gives the layer learnable affine parameters.

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | [ * , num_features , * , * , * ] | 5D Tensor of batch_size x num_features x depth x height x width output | Same | Output has the same shape as input

Returns

a normalized tensor in the batch dimension

Container

This is the base container class for all neural networks you would define.

# Example of using Container
 class Net(nn.Container):
    def __init__(self):
        super(Net, self).__init__(
            conv1 = nn.Conv2d(1, 20, 5),
            relu  = nn.ReLU()
         )
    def forward(self, input):
        output = self.relu(self.conv1(x))
        return output
 model = Net()
# one can add modules to the container after construction
model.add_module('pool1', nn.MaxPool2d(2, 2))

You will subclass your container from this class. In the constructor you define the modules that you would want to use, and in the “forward” function you use the constructed modules in your operations.

To make it easier to understand, given is a small example.

One can also add new modules to a container after construction. You can do this with the add_module function.

The container has one additional method parameters() which returns the list of learnable parameters in the container instance.

Conv1d

Applies a 1D convolution over an input signal composed of several input

The output value of the layer with input (b x iC x W) and output (b x oC x oW)
can be precisely described as:
output[b_i][oc_i][w_i] = bias[oc_i]
            + sum_iC sum_{ow = 0, oW-1} sum_{kw = 0 to kW-1}
                weight[oc_i][ic_i][kw] * input[b_i][ic_i][stride_w * ow + kw)]
m = nn.Conv1d(16, 33, 3, stride=2)
input = autograd.Variable(torch.randn(20, 16, 50))
output = m(input)

planes.

Note that depending of the size of your kernel, several (of the last) columns of the input might be lost. It is up to the user to add proper padding.

Constructor Arguments

ParameterDefaultDescription
in_channelsThe number of expected input channels in the image given as input
out_channelsThe number of output channels the convolution layer will produce
kernel_sizethe size of the convolving kernel.
stridethe stride of the convolving kernel.

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | [ * , in_channels , * ] | Input is minibatch x in_channels x iW output | [ * , out_channels , * ] | Output shape is precisely minibatch x out_channels x floor((iW + 2*padW - kW) / dW + 1)

Members

ParameterDescription
weightthe learnable weights of the module of shape (out_channels x in_channels x kW)
biasthe learnable bias of the module of shape (out_channels)

Conv2d

Applies a 2D convolution over an input image composed of several input

The output value of the layer with input (b x iC x H x W) and output (b x oC x oH x oW)
can be precisely described as:
output[b_i][oc_i][h_i][w_i] = bias[oc_i]
            + sum_iC sum_{oh = 0, oH-1} sum_{ow = 0, oW-1} sum_{kh = 0 to kH-1} sum_{kw = 0 to kW-1}
                weight[oc_i][ic_i][kh][kw] * input[b_i][ic_i][stride_h * oh + kh)][stride_w * ow + kw)]
# With square kernels and equal stride
m = nn.Conv2d(16, 33, 3, stride=2)
# non-square kernels and unequal stride and with padding
m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2))
# non-square kernels and unequal stride and with padding and dilation
m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2), dilation=(3, 1))
input = autograd.Variable(torch.randn(20, 16, 50, 100))
output = m(input)

planes.

Note that depending of the size of your kernel, several (of the last) columns or rows of the input image might be lost. It is up to the user to add proper padding in images.

Constructor Arguments

ParameterDefaultDescription
in_channelsThe number of expected input channels in the image given as input
out_channelsThe number of output channels the convolution layer will produce
kernel_sizethe size of the convolving kernel. Can be a single number k (for a square kernel of k x k) or a tuple (kh x kw)
stride1the stride of the convolving kernel. Can be a single number s or a tuple (sh x sw).
padding0implicit zero padding on the input. Can be a single number s or a tuple.
dilationNoneIf given, will do dilated (or atrous) convolutions. Can be a single number s or a tuple.
biasTrueIf set to False, the layer will not learn an additive bias.

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | [ * , in_channels , * , * ] | Input is minibatch x in_channels x iH x iW output | [ * , out_channels , * , * ] | Output shape is precisely minibatch x out_channels x floor((iH + 2padH - kH) / dH + 1) x floor((iW + 2padW - kW) / dW + 1)

Members

ParameterDescription
weightthe learnable weights of the module of shape (out_channels x in_channels x kH x kW)
biasthe learnable bias of the module of shape (out_channels)

Conv3d

Applies a 3D convolution over an input image composed of several input

# With square kernels and equal stride
m = nn.Conv3d(16, 33, 3, stride=2)
# non-square kernels and unequal stride and with padding
m = nn.Conv3d(16, 33, (3, 5, 2), stride=(2, 1, 1), padding=(4, 2, 0))
input = autograd.Variable(torch.randn(20, 16, 10, 50, 100))
output = m(input)

planes.

Note that depending of the size of your kernel, several (of the last) columns or rows of the input image might be lost. It is up to the user to add proper padding in images.

Constructor Arguments

ParameterDefaultDescription
in_channelsThe number of expected input channels in the image given as input
out_channelsThe number of output channels the convolution layer will produce
kernel_sizethe size of the convolving kernel. Can be a single number k (for a square kernel of k x k x k) or a tuple (kt x kh x kw)
stride1the stride of the convolving kernel. Can be a single number s or a tuple (kt x sh x sw).
padding0implicit zero padding on the input. Can be a single number s or a tuple.

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | [ * , in_channels , * , * , * ] | Input is minibatch x in_channels x iT x iH x iW output | [ * , out_channels , * , * , * ] | Output shape is precisely minibatch x out_channels x floor((iT + 2padT - kT) / dT + 1) x floor((iH + 2padH - kH) / dH + 1) x floor((iW + 2*padW - kW) / dW + 1)

Members

ParameterDescription
weightthe learnable weights of the module of shape (out_channels x in_channels x kT x kH x kW)
biasthe learnable bias of the module of shape (out_channels)

ConvTranspose2d

Applies a 2D deconvolution operator over an input image composed of several input

# With square kernels and equal stride
m = nn.ConvTranspose2d(16, 33, 3, stride=2)
# non-square kernels and unequal stride and with padding
m = nn.ConvTranspose2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2))
input = autograd.Variable(torch.randn(20, 16, 50, 100))
output = m(input)
# exact output size can be also specified as an argument
input = autograd.Variable(torch.randn(1, 16, 12, 12))
downsample = nn.Conv2d(16, 16, 3, stride=2, padding=1)
upsample = nn.ConvTranspose2d(16, 16, 3, stride=2, padding=1)
h = downsample(input)
output = upsample(h, output_size=input.size())

planes. The deconvolution operator multiplies each input value element-wise by a learnable kernel, and sums over the outputs from all input feature planes. This module can be seen as the exact reverse of the Conv2d module.

Constructor Arguments

ParameterDefaultDescription
in_channelsThe number of expected input channels in the image given as input
out_channelsThe number of output channels the convolution layer will produce
kernel_sizethe size of the convolving kernel. Can be a single number k (for a square kernel of k x k) or a tuple (kh x kw)
stride1the stride of the convolving kernel. Can be a single number or a tuple (sh x sw).
padding0implicit zero padding on the input. Can be a single number or a tuple.
output_padding0A padding of 0 or 1 pixels that should be added to the output. Can be a single number or a tuple.
biasTrueIf set to False, the layer will not learn an additive bias.

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | [ * , in_channels , * , * ] | Input is minibatch x in_channels x iH x iW output | [ * , out_channels , * , * ] | Output shape is minibatch x out_channels x (iH - 1) * sH - 2padH + kH + output_paddingH x (iW - 1) * sW - 2padW + kW, or as specified in a second argument to the call.

Members

ParameterDescription
weightthe learnable weights of the module of shape (in_channels x out_channels x kH x kW)
biasthe learnable bias of the module of shape (out_channels)

ConvTranspose3d

Applies a 3D deconvolution operator over an input image composed of several input

# With square kernels and equal stride
m = nn.ConvTranspose3d(16, 33, 3, stride=2)
# non-square kernels and unequal stride and with padding
m = nn.Conv3d(16, 33, (3, 5, 2), stride=(2, 1, 1), padding=(0, 4, 2))
input = autograd.Variable(torch.randn(20, 16, 10, 50, 100))
output = m(input)

planes. The deconvolution operator multiplies each input value element-wise by a learnable kernel, and sums over the outputs from all input feature planes. This module can be seen as the exact reverse of the Conv3d module.

Constructor Arguments

ParameterDefaultDescription
in_channelsThe number of expected input channels in the image given as input
out_channelsThe number of output channels the convolution layer will produce
kernel_sizethe size of the convolving kernel. Can be a single number k (for a square kernel of k x k x k) or a tuple (kt x kh x kw)
stride1the stride of the convolving kernel. Can be a single number or a tuple (st x sh x sw).
padding0implicit zero padding on the input. Can be a single number or a tuple.
output_padding0A padding of 0 or 1 pixels that should be added to the output. Can be a single number or a tuple.

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | [ * , in_channels , * , * , * ] | Input is minibatch x in_channels x iH x iW output | [ * , out_channels , * , * , * ] | Output shape is precisely minibatch x out_channels x (iT - 1) * sT - 2padT + kT + output_paddingT x (iH - 1) * sH - 2padH + kH + output_paddingH x (iW - 1) * sW - 2*padW + kW

Members

ParameterDescription
weightthe learnable weights of the module of shape (in_channels x out_channels x kT x kH x kW)
biasthe learnable bias of the module of shape (out_channels)

Dropout

Randomly zeroes some of the elements of the input tensor.

m = nn.Dropout(p=0.2)
input = autograd.Variable(torch.randn(20, 16))
output = m(input)

The elements to zero are randomized on every forward call.

Constructor Arguments

ParameterDefaultDescription
p0.5probability of an element to be zeroed.
inplacefalseIf set to True, will do this operation in-place.

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | Any | Input can be of any shape output | Same | Output is of the same shape as input

Dropout2d

Randomly zeroes whole channels of the input tensor.

m = nn.Dropout2d(p=0.2)
input = autograd.Variable(torch.randn(20, 16, 32, 32))
output = m(input)

The input is 4D (batch x channels, height, width) and each channel is of size (1, height, width). The channels to zero are randomized on every forward call. Usually the input comes from Conv2d modules.

As described in the paper "Efficient Object Localization Using Convolutional Networks" (http:arxiv.org/abs/1411.4280), if adjacent pixels within feature maps are strongly correlated (as is normally the case in early convolution layers) then iid dropout will not regularize the activations and will otherwise just result in an effective learning rate decrease. In this case, nn.Dropout2d will help promote independence between feature maps and should be used instead.

Constructor Arguments

ParameterDefaultDescription
p0.5probability of an element to be zeroed.
inplacefalseIf set to True, will do this operation in-place.

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | [*, *, *, *] | Input can be of any sizes of 4D shape output | Same | Output is of the same shape as input

Dropout3d

Randomly zeroes whole channels of the input tensor.

m = nn.Dropout3d(p=0.2)
input = autograd.Variable(torch.randn(20, 16, 4, 32, 32))
output = m(input)

The input is 5D (batch x channels, depth, height, width) and each channel is of size (1, depth, height, width). The channels to zero are randomized on every forward call. Usually the input comes from Conv3d modules.

Constructor Arguments

ParameterDefaultDescription
p0.5probability of an element to be zeroed.
inplacefalseIf set to True, will do this operation in-place.

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | [*, *, *, *, *] | Input can be of any sizes of 5D shape output | Same | Output is of the same shape as input

ELU

Applies element-wise, ELU(x) = max(0,x) + min(0, alpha * (exp(x) - 1))

m = nn.ELU()
input = autograd.Variable(torch.randn(2))
print(input)
print(m(input))

Constructor Arguments

ParameterDefaultDescription
alpha1.0the alpha value for the ELU formulation.
inplacecan optionally do the operation in-place

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | Any | Tensor of any size and dimension output | Same | Output has the same shape as input

Returns

a Tensor of the same dimension and shape as the input

A simple lookup table that stores embeddings of a fixed dictionary and size

# an Embedding module containing 10 tensors of size 3
embedding = nn.Embedding(10, 3)
# a batch of 2 samples of 4 indices each
input = torch.LongTensor([[1,2,4,5],[4,3,2,9]])
print(embedding(input))
# example with padding_idx
embedding = nn.Embedding(10, 3, padding_idx=0)
input = torch.LongTensor([[0,2,0,5]])
print(embedding(input))

This module is often used to store word embeddings and retrieve them using indices. The input to the module is a list of indices, and the output is the corresponding word embeddings.

Constructor Arguments

ParameterDefaultDescription
num_embeddingssize of the dictionary of embeddings
embedding_dimthe size of each embedding vector
padding_idxNoneIf given, pads the output with zeros whenever it encounters the index.
max_normNoneIf given, will renormalize the embeddings to always have a norm lesser than this
norm_typeThe p of the p-norm to compute for the max_norm option
scale_grad_by_freqif given, this will scale gradients by the frequency of the words in the dictionary.

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | [ *, * ] | Input is a 2D mini_batch LongTensor of m x n indices to extract from the Embedding dictionary output | [ * , *, * ] | Output shape = m x n x embedding_dim

FractionalMaxPool2d

Applies a 2D fractional max pooling over an input signal composed of several input

# pool of square window of size=3, and target output size 13x12
m = nn.FractionalMaxPool2d(3, output_size=(13, 12))
# pool of square window and target output size being half of input image size
m = nn.FractionalMaxPool2d(3, output_ratio=(0.5, 0.5))
input = autograd.Variable(torch.randn(20, 16, 50, 32))
output = m(input)

planes.

Fractiona MaxPooling is described in detail in the paper “Fractional Max-Pooling” by Ben Graham The max-pooling operation is applied in kHxkW regions by a stochastic step size determined by the target output size. The number of output features is equal to the number of input planes.

Constructor Arguments

ParameterDefaultDescription
kernel_sizethe size of the window to take a max over. Can be a single number k (for a square kernel of k x k) or a tuple (kh x kw)
output_sizethe target output size of the image of the form oH x oW. Can be a tuple (oH, oW) or a single number oH for a square image oH x oH
output_ratioIf one wants to have an output size as a ratio of the input size, this option can be given. This has to be a number or tuple in the range (0, 1)
return_indicesFalseif True, will return the indices along with the outputs. Useful to pass to nn.MaxUnpool2d .

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | [ * , * , , * ] | Input is minibatch x channels x iH x iW output | [ * , * , , * ] | Output shape = minibatch x channels x floor((iH + 2padH - kH) / sH + 1) x floor((iW + 2padW - kW) / sW + 1)

GRU

Applies a multi-layer gated recurrent unit (GRU) RNN to an input sequence.

r_t = sigmoid(W_ir x_t + b_ir + W_hr h_(t-1) + b_hr)
i_t = sigmoid(W_ii x_t + b_ii + W_hi h_(t-1) + b_hi)
n_t = tanh(W_in x_t + resetgate * W_hn h_(t-1))
h_t = (1 - i_t) * n_t + i_t * h_(t-1)
rnn = nn.GRU(10, 20, 2)
input = Variable(torch.randn(5, 3, 10))
h0 = Variable(torch.randn(2, 3, 20))
output, hn = rnn(input, h0)

For each element in the input sequence, each layer computes the following function: where h_t is the hidden state at time t, x_t is the hidden state of the previous layer at time t or input_t for the first layer, and r_t, i_t, n_t are the reset, input, and new gates, respectively.

Constructor Arguments

ParameterDefaultDescription
input_sizeThe number of expected features in the input x
hidden_sizeThe number of features in the hidden state h
num_layersthe size of the convolving kernel.
biasTrueIf False, then the layer does not use bias weights b_ih and b_hh.
batch_firstIf True, then the input tensor is provided as (batch, seq, feature)
dropoutIf non-zero, introduces a dropout layer on the outputs of each RNN layer

Inputs

ParameterDefaultDescription
inputA (seq_len x batch x input_size) tensor containing the features of the input sequence.
h_0A (num_layers x batch x hidden_size) tensor containing the initial hidden state for each element in the batch.

Outputs

ParameterDescription
outputA (seq_len x batch x hidden_size) tensor containing the output features (h_t) from the last layer of the RNN, for each t
h_nA (num_layers x batch x hidden_size) tensor containing the hidden state for t=seq_len

Members

ParameterDescription
weight_ih_l[k]the learnable input-hidden weights of the k-th layer (W_ir
weight_hh_l[k]the learnable hidden-hidden weights of the k-th layer (W_hr
bias_ih_l[k]the learnable input-hidden bias of the k-th layer (b_ir
bias_hh_l[k]the learnable hidden-hidden bias of the k-th layer (W_hr

GRUCell

A gated recurrent unit (GRU) cell

r = sigmoid(W_ir x + b_ir + W_hr h + b_hr)
i = sigmoid(W_ii x + b_ii + W_hi h + b_hi)
n = tanh(W_in x + resetgate * W_hn h)
h' = (1 - i) * n + i * h
rnn = nn.RNNCell(10, 20)
input = Variable(torch.randn(6, 3, 10))
hx = Variable(torch.randn(3, 20))
output = []
for i in range(6):
    hx = rnn(input, hx)
    output[i] = hx

Constructor Arguments

ParameterDefaultDescription
input_sizeThe number of expected features in the input x
hidden_sizeThe number of features in the hidden state h
biasTrueIf False, then the layer does not use bias weights b_ih and b_hh.

Inputs

ParameterDefaultDescription
inputA (batch x input_size) tensor containing input features
hiddenA (batch x hidden_size) tensor containing the initial hidden state for each element in the batch.

Outputs

ParameterDescription
h'A (batch x hidden_size) tensor containing the next hidden state for each element in the batch

Members

ParameterDescription
weight_ihthe learnable input-hidden weights, of shape (input_size x hidden_size)
weight_hhthe learnable hidden-hidden weights, of shape (hidden_size x hidden_size)
bias_ihthe learnable input-hidden bias, of shape (hidden_size)
bias_hhthe learnable hidden-hidden bias, of shape (hidden_size)

Hardshrink

Applies the hard shrinkage function element-wise

m = nn.Hardshrink()
input = autograd.Variable(torch.randn(2))
print(input)
print(m(input))

Hardshrink is defined as f(x) = x, if x > lambda f(x) = x, if x < -lambda f(x) = 0, otherwise

Constructor Arguments

ParameterDefaultDescription
lambd0.5the lambda value for the Hardshrink formulation.

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | Any | Tensor of any size and dimension output | Same | Output has the same shape as input

Returns

a Tensor of the same dimension and shape as the input

Applies the HardTanh function element-wise

m = nn.HardTanh(-2, 2)
input = autograd.Variable(torch.randn(2))
print(input)
print(m(input))

HardTanh is defined as: f(x) = +1, if x > 1 f(x) = -1, if x < -1 f(x) = x, otherwise The range of the linear region [-1, 1] can be adjusted

Constructor Arguments

ParameterDefaultDescription
min_valueminimum value of the linear region range
max_valuemaximum value of the linear region range
inplacecan optionally do the operation in-place

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | Any | Tensor of any size and dimension output | Same | Output has the same shape as input

Returns

a Tensor of the same dimension and shape as the input

Applies a 2D power-average pooling over an input signal composed of several input

# power-2 pool of square window of size=3, stride=2
m = nn.LPPool2d(2, 3, stride=2)
# pool of non-square window of power 1.2
m = nn.LPPool2d(1.2, (3, 2), stride=(2, 1))
input = autograd.Variable(torch.randn(20, 16, 50, 32))
output = m(input)

planes. On each window, the function computed is: f(X) = pow(sum(pow(X, p)), 1/p) At p = infinity, one gets Max Pooling At p = 1, one gets Average Pooling

Constructor Arguments

ParameterDefaultDescription
kernel_sizethe size of the window. Can be a single number k (for a square kernel of k x k) or a tuple (kh x kw)
stridekernel_sizethe stride of the window. Can be a single number s or a tuple (sh x sw).
ceil_modewhen True, will use “ceil” instead of “floor” to compute the output shape

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | [ * , * , , * ] | Input is minibatch x channels x iH x iW output | [ * , * , , * ] | Output shape = minibatch x channels x floor((iH + 2padH - kH) / sH + 1) x floor((iW + 2padW - kW) / sW + 1)

LSTM

Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence.

i_t = sigmoid(W_ii x_t + b_ii + W_hi h_(t-1) + b_hi)
f_t = sigmoid(W_if x_t + b_if + W_hf h_(t-1) + b_hf)
g_t = tanh(W_ig x_t + b_ig + W_hc h_(t-1) + b_hg)
o_t = sigmoid(W_io x_t + b_io + W_ho h_(t-1) + b_ho)
c_t = f_t * c_(t-1) + i_t * c_t
h_t = o_t * tanh(c_t)
rnn = nn.LSTM(10, 20, 2)
input = Variable(torch.randn(5, 3, 10))
h0 = Variable(torch.randn(2, 3, 20))
c0 = Variable(torch.randn(2, 3, 20))
output, hn = rnn(input, (h0, c0))

For each element in the input sequence, each layer computes the following function: where h_t is the hidden state at time t, c_t is the cell state at time t, x_t is the hidden state of the previous layer at time t or input_t for the first layer, and i_t, f_t, g_t, o_t are the input, forget, cell, and out gates, respectively.

Constructor Arguments

ParameterDefaultDescription
input_sizeThe number of expected features in the input x
hidden_sizeThe number of features in the hidden state h
num_layersthe size of the convolving kernel.
biasTrueIf False, then the layer does not use bias weights b_ih and b_hh.
batch_firstIf True, then the input tensor is provided as (batch, seq, feature)
dropoutIf non-zero, introduces a dropout layer on the outputs of each RNN layer

Inputs

ParameterDefaultDescription
inputA (seq_len x batch x input_size) tensor containing the features of the input sequence.
h_0A (num_layers x batch x hidden_size) tensor containing the initial hidden state for each element in the batch.
c_0A (num_layers x batch x hidden_size) tensor containing the initial cell state for each element in the batch.

Outputs

ParameterDescription
outputA (seq_len x batch x hidden_size) tensor containing the output features (h_t) from the last layer of the RNN, for each t
h_nA (num_layers x batch x hidden_size) tensor containing the hidden state for t=seq_len
c_nA (num_layers x batch x hidden_size) tensor containing the cell state for t=seq_len

Members

ParameterDescription
weight_ih_l[k]the learnable input-hidden weights of the k-th layer (W_ir
weight_hh_l[k]the learnable hidden-hidden weights of the k-th layer (W_hr
bias_ih_l[k]the learnable input-hidden bias of the k-th layer (b_ir
bias_hh_l[k]the learnable hidden-hidden bias of the k-th layer (W_hr

LSTMCell

A long short-term memory (LSTM) cell.

i = sigmoid(W_ii x + b_ii + W_hi h + b_hi)
f = sigmoid(W_if x + b_if + W_hf h + b_hf)
g = tanh(W_ig x + b_ig + W_hc h + b_hg)
o = sigmoid(W_io x + b_io + W_ho h + b_ho)
c' = f * c + i * c
h' = o * tanh(c_t)
rnn = nn.LSTMCell(10, 20)
input = Variable(torch.randn(6, 3, 10))
hx = Variable(torch.randn(3, 20))
cx = Variable(torch.randn(3, 20))
output = []
for i in range(6):
    hx, cx = rnn(input, (hx, cx))
    output[i] = hx

Constructor Arguments

ParameterDefaultDescription
input_sizeThe number of expected features in the input x
hidden_sizeThe number of features in the hidden state h
biasTrueIf False, then the layer does not use bias weights b_ih and b_hh.

Inputs

ParameterDefaultDescription
inputA (batch x input_size) tensor containing input features
hiddenA (batch x hidden_size) tensor containing the initial hidden state for each element in the batch.

Outputs

ParameterDescription
h'A (batch x hidden_size) tensor containing the next hidden state for each element in the batch
c'A (batch x hidden_size) tensor containing the next cell state for each element in the batch

Members

ParameterDescription
weight_ihthe learnable input-hidden weights, of shape (input_size x hidden_size)
weight_hhthe learnable hidden-hidden weights, of shape (hidden_size x hidden_size)
bias_ihthe learnable input-hidden bias, of shape (hidden_size)
bias_hhthe learnable hidden-hidden bias, of shape (hidden_size)

LeakyReLU

Applies element-wise, f(x) = max(0, x) + negative_slope * min(0, x)

m = nn.LeakyReLU(0.1)
input = autograd.Variable(torch.randn(2))
print(input)
print(m(input))

Constructor Arguments

ParameterDefaultDescription
negative_slope1e-2Controls the angle of the negative slope.
inplacecan optionally do the operation in-place

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | Any | Tensor of any size and dimension output | Same | Output has the same shape as input

Returns

a Tensor of the same dimension and shape as the input

Linear

Applies a linear transformation to the incoming data, y = Ax + b

m = nn.Linear(20, 30)
input = autograd.Variable(torch.randn(128, 20))
output = m(input)
print(output.size())

The input is a 2D mini-batch of samples, each of size in_features The output will be a 2D Tensor of size mini-batch x out_features

Constructor Arguments

ParameterDefaultDescription
in_featuressize of each input sample
out_featuressize of each output sample
biasTrueIf set to False, the layer will not learn an additive bias.

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | [, in_features] | Input can be of shape minibatch x in_features output | [, out_features] | Output is of shape minibatch x out_features

Members

ParameterDescription
weightthe learnable weights of the module of shape (out_features x in_features)
biasthe learnable bias of the module of shape (out_features)

LogSigmoid

Applies element-wise LogSigmoid(x) = log( 1 / (1 + exp(-x_i)))

m = nn.LogSigmoid()
input = autograd.Variable(torch.randn(2))
print(input)
print(m(input))

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | Any | Tensor of any size and dimension output | Same | Output has the same shape as input

Returns

a Tensor of the same dimension and shape as the input

Applies the Log(Softmax(x)) function to an n-dimensional input Tensor.

m = nn.LogSoftmax()
input = autograd.Variable(torch.randn(2, 3))
print(input)
print(m(input))

The LogSoftmax formulation can be simplified as f_i(x) = log(1 / a * exp(x_i)) where a = sum_j exp(x_j) .

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | [ * , * ] | 2D Tensor of any size output | Same | Output has the same shape as input

Returns

a Tensor of the same dimension and shape as the input with
values in the range [-inf, 0)

Applies a 1D max pooling over an input signal composed of several input

The output value of the layer with input (b x C x W) and output (b x C x oW)
can be precisely described as:
output[b_i][c_i][w_i] = max_{k=1, K} input[b_i][c_i][stride_w * w_i + k)]
# pool of size=3, stride=2
m = nn.MaxPool1d(3, stride=2)
input = autograd.Variable(torch.randn(20, 16, 50))
output = m(input)

planes.

Constructor Arguments

ParameterDefaultDescription
kernel_sizethe size of the window to take a max over
stridethe stride of the window
padding0implicit padding to be added.
dilationkernel_sizea parameter that controls the stride of elements in the window.
return_indicesFalseif True, will return the indices along with the outputs. Useful when Unpooling later.
ceil_modewhen True, will use “ceil” instead of “floor” to compute the output shape

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | [ * , * , * ] | Input is minibatch x channels x iW output | [ * , * , * ] | Output shape = minibatch x channels x floor((iW + 2*padW - kernel_size) / stride + 1)

MaxPool2d

Applies a 2D max pooling over an input signal composed of several input

The output value of the layer with input (b x C x H x W) and output (b x C x oH x oW)
can be precisely described as:
output[b_i][c_i][h_i][w_i] = max_{{kh=1, KH}, {kw=1, kW}} input[b_i][c_i][stride_h * h_i + kH)][stride_w * w_i + kW)]
# pool of square window of size=3, stride=2
m = nn.MaxPool2d(3, stride=2)
# pool of non-square window
m = nn.MaxPool2d((3, 2), stride=(2, 1))
input = autograd.Variable(torch.randn(20, 16, 50, 32))
output = m(input)

planes.

Constructor Arguments

ParameterDefaultDescription
kernel_sizethe size of the window to take a max over. Can be a single number k (for a square kernel of k x k) or a tuple (kh x kw)
stridekernel_sizethe stride of the window. Can be a single number s or a tuple (sh x sw).
padding0implicit padding to be added. Can be a single number or a tuple.
dilation1a parameter that controls the stride of elements in the window. Can be a single number or a tuple.
return_indicesFalseif True, will return the indices along with the outputs. Useful to pass to nn.MaxUnpool2d .
ceil_modewhen True, will use “ceil” instead of “floor” to compute the output shape

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | [ * , * , , * ] | Input is minibatch x channels x iH x iW output | [ * , * , , * ] | Output shape = minibatch x channels x floor((iH + 2padH - kH) / sH + 1) x floor((iW + 2padW - kW) / sW + 1)

MaxPool3d

Applies a 3D max pooling over an input signal composed of several input

# pool of square window of size=3, stride=2
m = nn.MaxPool3d(3, stride=2)
# pool of non-square window
m = nn.MaxPool3d((3, 2, 2), stride=(2, 1, 2))
input = autograd.Variable(torch.randn(20, 16, 50,44, 31))
output = m(input)

planes.

Constructor Arguments

ParameterDefaultDescription
kernel_sizethe size of the window to take a max over. Can be a single number k (for a square kernel of k x k x k) or a tuple (kt x kh x kw)
stridekernel_sizethe stride of the window. Can be a single number s or a tuple (st x sh x sw).
padding0implicit padding to be added. Can be a single number or a tuple.
dilation1a parameter that controls the stride of elements in the window. Can be a single number or a tuple.
return_indicesFalseif True, will return the indices along with the outputs. Useful to pass to nn.MaxUnpool3d .
ceil_modewhen True, will use “ceil” instead of “floor” to compute the output shape

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | [ * , * , *, , * ] | Input is minibatch x channels x iT x iH x iW output | [ * , * , , , * ] | Output shape = minibatch x channels x floor((iT + 2padT - kT) / sT + 1) x floor((iH + 2padH - kH) / sH + 1) x floor((iW + 2padW - kW) / sW + 1)

MaxUnpool2d

Computes the inverse operation of MaxPool2d

# pool of square window of size=3, stride=2
m = nn.MaxPool2d(2, stride=2, return_indices = True)
mu = nn.MaxUnpool2d(2, stride=2)
input = autograd.Variable(torch.randn(20, 16, 50, 32))
output, indices = m(input)
unpooled_output = mu.forward(output, indices)
# exact output size can be also specified as an argument
input = autograd.Variable(torch.randn(1, 16, 11, 11))
downsample = nn.MaxPool2d(3, 3, return_indices=True)
upsample = nn.MaxUnpool2d(3, 3)
h, indices = downsample(input)
output = upsample(h, indices, output_size=input.size())

MaxPool2d is not invertible, as the locations of the max locations are lost. MaxUnpool2d takes in as input the output of MaxPool2d and the indices of the Max locations and computes the inverse.

Constructor Arguments

ParameterDefaultDescription
kernel_sizethe size of the max window. Can be a single number k (for a square kernel of k x k) or a tuple (kh x kw)
stridekernel_sizethe stride of the window. Can be a single number s or a tuple (sh x sw).
padding0implicit padding that was added to the input. Can be a single number or a tuple.

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | [ * , * , *, * ] | Input is minibatch x channels x iH x iW output | [ * , * , *, * ] | Output shape is minibatch x channels x padH x (iH - 1) * sH + kH x padW x (iW - 1) * sW + kW, or as specified to the call.

MaxUnpool3d

Computes the inverse operation of MaxPool3d

# pool of square window of size=3, stride=2
m = nn.MaxPool3d(3, stride=2, return_indices = True)
mu = nn.MaxUnpool3d(3, stride=2)
input, indices = autograd.Variable(torch.randn(20, 16, 50, 32, 15))
output = m(input)
unpooled_output = m2.forward(output, indices)

MaxPool3d is not invertible, as the locations of the max locations are lost. MaxUnpool3d takes in as input the output of MaxPool3d and the indices of the Max locations and computes the inverse.

Constructor Arguments

ParameterDefaultDescription
kernel_sizethe size of the max window. Can be a single number k (for a square kernel of k x k) or a tuple (kt x kh x kw)
stridekernel_sizethe stride of the window. Can be a single number s or a tuple (st x sh x sw).
padding0implicit padding that was added to the input. Can be a single number or a tuple.

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | [ * , * , *, *, * ] | Input is minibatch x channels x iT x iH x iW output | [ * , * , *, *, * ] | Output shape = minibatch x channels x padT x (iT - 1) * sT + kT x padH x (iH - 1) * sH + kH x padW x (iW - 1) * sW + kW

PReLU

Applies element-wise the function PReLU(x) = max(0,x) + a * min(0,x)

m = nn.PReLU()
input = autograd.Variable(torch.randn(2))
print(input)
print(m(input))

Here “a” is a learnable parameter. When called without arguments, nn.PReLU() uses a single parameter “a” across all input channels. If called with nn.PReLU(nChannels), a separate “a” is used for each input channel. Note that weight decay should not be used when learning “a” for good performance.

Constructor Arguments

ParameterDefaultDescription
num_parameters1number of “a” to learn.
init0.25the initial value of “a”.

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | Any | Tensor of any size and dimension output | Same | Output has the same shape as input

Returns

a Tensor of the same dimension and shape as the input

Applies a multi-layer Elman RNN with tanh or ReLU non-linearity to an input sequence.

h_t = tanh(w_ih * x_t + b_ih  +  w_hh * h_(t-1) + b_hh)
rnn = nn.RNN(10, 20, 2)
input = Variable(torch.randn(5, 3, 10))
h0 = Variable(torch.randn(2, 3, 20))
output, hn = rnn(input, h0)

For each element in the input sequence, each layer computes the following function: where h_t is the hidden state at time t, and x_t is the hidden state of the previous layer at time t or input_t for the first layer. If nonlinearity=‘relu’, then ReLU is used instead of tanh.

Constructor Arguments

ParameterDefaultDescription
input_sizeThe number of expected features in the input x
hidden_sizeThe number of features in the hidden state h
num_layersthe size of the convolving kernel.
nonlinearity‘tanh’The non-linearity to use [‘tanh’
biasTrueIf False, then the layer does not use bias weights b_ih and b_hh.
batch_firstIf True, then the input tensor is provided as (batch, seq, feature)
dropoutIf non-zero, introduces a dropout layer on the outputs of each RNN layer

Inputs

ParameterDefaultDescription
inputA (seq_len x batch x input_size) tensor containing the features of the input sequence.
h_0A (num_layers x batch x hidden_size) tensor containing the initial hidden state for each element in the batch.

Outputs

ParameterDescription
outputA (seq_len x batch x hidden_size) tensor containing the output features (h_k) from the last layer of the RNN, for each k
h_nA (num_layers x batch x hidden_size) tensor containing the hidden state for k=seq_len

Members

ParameterDescription
weight_ih_l[k]the learnable input-hidden weights of the k-th layer, of shape (input_size x hidden_size)
weight_hh_l[k]the learnable hidden-hidden weights of the k-th layer, of shape (hidden_size x hidden_size)
bias_ih_l[k]the learnable input-hidden bias of the k-th layer, of shape (hidden_size)
bias_hh_l[k]the learnable hidden-hidden bias of the k-th layer, of shape (hidden_size)

RNNCell

An Elman RNN cell with tanh or ReLU non-linearity.

h' = tanh(w_ih * x + b_ih  +  w_hh * h + b_hh)
rnn = nn.RNNCell(10, 20)
input = Variable(torch.randn(6, 3, 10))
hx = Variable(torch.randn(3, 20))
output = []
for i in range(6):
    hx = rnn(input, hx)
    output[i] = hx

If nonlinearity=‘relu’, then ReLU is used in place of tanh.

Constructor Arguments

ParameterDefaultDescription
input_sizeThe number of expected features in the input x
hidden_sizeThe number of features in the hidden state h
biasTrueIf False, then the layer does not use bias weights b_ih and b_hh.
nonlinearity‘tanh’The non-linearity to use [‘tanh’

Inputs

ParameterDefaultDescription
inputA (batch x input_size) tensor containing input features
hiddenA (batch x hidden_size) tensor containing the initial hidden state for each element in the batch.

Outputs

ParameterDescription
h'A (batch x hidden_size) tensor containing the next hidden state for each element in the batch

Members

ParameterDescription
weight_ihthe learnable input-hidden weights, of shape (input_size x hidden_size)
weight_hhthe learnable hidden-hidden weights, of shape (hidden_size x hidden_size)
bias_ihthe learnable input-hidden bias, of shape (hidden_size)
bias_hhthe learnable hidden-hidden bias, of shape (hidden_size)

ReLU

Applies the rectified linear unit function element-wise ReLU(x)= max(0,x)

m = nn.ReLU()
input = autograd.Variable(torch.randn(2))
print(input)
print(m(input))

Constructor Arguments

ParameterDefaultDescription
inplacecan optionally do the operation in-place

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | Any | Tensor of any size and dimension output | Same | Output has the same shape as input

Returns

a Tensor of the same dimension and shape as the input

Applies the element-wise function ReLU6(x) = min( max(0,x), 6)

m = nn.ReLU6()
input = autograd.Variable(torch.randn(2))
print(input)
print(m(input))

Constructor Arguments

ParameterDefaultDescription
inplacecan optionally do the operation in-place

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | Any | Tensor of any size and dimension output | Same | Output has the same shape as input

Returns

a Tensor of the same dimension and shape as the input

A sequential Container. It is derived from the base nn.Container class

# Example of using Sequential
model = nn.Sequential(
          nn.Conv2d(1,20,5),
          nn.ReLU(),
          nn.Conv2d(20,64,5),
          nn.ReLU()
        )

Modules will be added to it in the order they are passed in the constructor. Alternatively, an ordered dict of modules can also be passed in.

To make it easier to understand, given is a small example.

Example of using Sequential with OrderedDict

model = nn.Sequential(OrderedDict([ (‘conv1’, nn.Conv2d(1,20,5)), (‘relu1’, nn.ReLU()), (‘conv2’, nn.Conv2d(20,64,5)), (‘relu2’, nn.ReLU()) ]))

Sigmoid

Applies the element-wise function sigmoid(x) = 1 / ( 1 + exp(-x))

m = nn.Sigmoid()
input = autograd.Variable(torch.randn(2))
print(input)
print(m(input))

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | Any | Tensor of any size and dimension output | Same | Output has the same shape as input

Returns

a Tensor of the same dimension and shape as the input

Applies the Softmax function to an n-dimensional input Tensor

m = nn.Softmax()
input = autograd.Variable(torch.randn(2, 3))
print(input)
print(m(input))

rescaling them so that the elements of the n-dimensional output Tensor lie in the range (0,1) and sum to 1

Softmax is defined as f_i(x) = exp(x_i - shift) / sum_j exp(x_j - shift) where shift = max_i x_i

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | [ * , * ] | 2D Tensor of any size output | Same | Output has the same shape as input

Returns

a Tensor of the same dimension and shape as the input with
values in the range [0, 1]

Applies SoftMax over features to each spatial location

m = nn.Softmax2d()
# you softmax over the 2nd dimension
input = autograd.Variable(torch.randn(2, 3, 12, 13))
print(input)
print(m(input))

When given an image of Channels x Height x Width, it will apply Softmax to each location [Channels, h_i, w_j]

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | [ * , * , * , * ] | 4D Tensor of any size output | Same | Output has the same shape as input

Returns

a Tensor of the same dimension and shape as the input with
values in the range [0, 1]

Softmin

Applies the Softmin function to an n-dimensional input Tensor

m = nn.Softmin()
input = autograd.Variable(torch.randn(2, 3))
print(input)
print(m(input))

rescaling them so that the elements of the n-dimensional output Tensor lie in the range (0,1) and sum to 1 Softmin(x) = exp(-x_i - shift) / sum_j exp(-x_j - shift) where shift = max_i - x_i

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | [ * , * ] | 2D Tensor of any size output | Same | Output has the same shape as input

Returns

a Tensor of the same dimension and shape as the input, with
values in the range [0, 1]

Applies element-wise SoftPlus(x) = 1/beta * log(1 + exp(beta * x_i))

m = nn.Softplus()
input = autograd.Variable(torch.randn(2))
print(input)
print(m(input))

SoftPlus is a smooth approximation to the ReLU function and can be used to constrain the output of a machine to always be positive. For numerical stability the implementation reverts to the linear function for inputs above a certain value.

Constructor Arguments

ParameterDefaultDescription
beta1the beta value for the Softplus formulation.
threshold20values above this revert to a linear function.

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | Any | Tensor of any size and dimension output | Same | Output has the same shape as input

Returns

a Tensor of the same dimension and shape as the input

Applies the soft shrinkage function elementwise

m = nn.Softshrink()
input = autograd.Variable(torch.randn(2))
print(input)
print(m(input))

SoftShrinkage operator is defined as: f(x) = x-lambda, if x > lambda > f(x) = x+lambda, if x < -lambda f(x) = 0, otherwise

Constructor Arguments

ParameterDefaultDescription
lambd0.5the lambda value for the Softshrink formulation.

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | Any | Tensor of any size and dimension output | Same | Output has the same shape as input

Returns

a Tensor of the same dimension and shape as the input

Applies element-wise, the function Softsign(x) = x / (1 + |x|)

m = nn.Softsign()
input = autograd.Variable(torch.randn(2))
print(input)
print(m(input))

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | Any | Tensor of any size and dimension output | Same | Output has the same shape as input

Returns

a Tensor of the same dimension and shape as the input

Applies element-wise, Tanh(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))

m = nn.Tanh()
input = autograd.Variable(torch.randn(2))
print(input)
print(m(input))

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | Any | Tensor of any size and dimension output | Same | Output has the same shape as input

Returns

a Tensor of the same dimension and shape as the input

Applies element-wise, Tanhshrink(x) = x - Tanh(x)

m = nn.Tanhshrink()
input = autograd.Variable(torch.randn(2))
print(input)
print(m(input))

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | Any | Tensor of any size and dimension output | Same | Output has the same shape as input

Returns

a Tensor of the same dimension and shape as the input

Threshold

Thresholds each element of the input Tensor

m = nn.Threshold(0.1, 20)
input = Variable(torch.randn(2))
print(input)
print(m(input))

Threshold is defined as: y = x if x >= threshold value if x < threshold

Constructor Arguments

ParameterDefaultDescription
thresholdThe value to threshold at
valueThe value to replace with
inplacecan optionally do the operation in-place

Expected Shape

   | Shape | Description 

------ | ----- | ------------ input | Any | Tensor of any size and dimension output | Same | Output has the same shape as input

Returns

Tensor of same dimension and shape as the input