What is the class definition of nn.Linear in PyTorch?

Question

What is the class definition of nn.Linear in pytorch?

From documentation:

CLASS torch.nn.Linear(in_features, out_features, bias=True)

Applies a linear transformation to the incoming data: y = x*W^T + b

Parameters:

in_features – size of each input sample (i.e. size of x)
out_features – size of each output sample (i.e. size of y)
bias – If set to False, the layer will not learn an additive bias. Default: True

Note that the weights W have shape (out_features, in_features) and biases b have shape (out_features). They are initialized randomly and can be changed later (e.g. during the training of a Neural Network they are updated by some optimization algorithm).

In your Neural Network, the self.hidden = nn.Linear(784, 256) defines a hidden (meaning that it is in between of the input and output layers), fully connected linear layer, which takes input x of shape (batch_size, 784), where batch size is the number of inputs (each of size 784) which are passed to the network at once (as a single tensor), and transforms it by the linear equation y = x*W^T + b into a tensor y of shape (batch_size, 256). It is further transformed by the sigmoid function, x = F.sigmoid(self.hidden(x)) (which is not a part of the nn.Linear but an additional step).

Let’s see a concrete example:

import torch
import torch.nn as nn

x = torch.tensor([[1.0, -1.0],
                  [0.0,  1.0],
                  [0.0,  0.0]])

in_features = x.shape[1]  # = 2
out_features = 2

m = nn.Linear(in_features, out_features)

where x contains three inputs (i.e. the batch size is 3), x[0], x[1] and x[3], each of size 2, and the output is going to be of shape (batch size, out_features) = (3, 2).

The values of the parameters (weights and biases) are:

>>> m.weight
tensor([[-0.4500,  0.5856],
        [-0.1807, -0.4963]])

>>> m.bias
tensor([ 0.2223, -0.6114])

(because they were initialized randomly, most likely you will get different values from the above)

The output is:

>>> y = m(x)
tensor([[-0.8133, -0.2959],
        [ 0.8079, -1.1077],
        [ 0.2223, -0.6114]])

and (behind the scenes) it is computed as:

y = x.matmul(m.weight.t()) + m.bias  # y = x*W^T + b

i.e.

y[i,j] == x[i,0] * m.weight[j,0] + x[i,1] * m.weight[j,1] + m.bias[j]

where i is in interval [0, batch_size) and j in [0, out_features).

Leave a Comment Cancel reply