adam – Tarik Billa

400% higher error with PyTorch compared with identical Keras model (with Adam optimizer)

November 21, 2023 by Tarik

The problem here is unintentional broadcasting in the PyTorch training loop. The result of a nn.Linear operation always has shape [B,D], where B is the batch size and D is the output dimension. Therefore, in your mean_squared_error function ypred has shape [32,1] and ytrue has shape [32]. By the broadcasting rules used by NumPy and … Read more