The problem here is unintentional broadcasting in the PyTorch training loop.

The result of a `nn.Linear`

operation always has shape `[B,D]`

, where `B`

is the batch size and `D`

is the output dimension. Therefore, in your `mean_squared_error`

function `ypred`

has shape `[32,1]`

and `ytrue`

has shape `[32]`

. By the broadcasting rules used by NumPy and PyTorch this means that `ytrue - ypred`

has shape `[32,32]`

. What you almost certainly meant is for `ypred`

to have shape `[32]`

. This can be accomplished in many ways; probably the most readable is to use `Tensor.flatten`

```
class TorchLinearModel(nn.Module):
...
def forward(self, x):
x = self.hidden_layer(x)
x = self.output_layer(x)
return x.flatten()
```

which produces the following train/val curves