Pytorch LSTM vs LSTMCell

Question

Yes, you can emulate one by another, the reason for having them separate is efficiency.

LSTMCell is a cell that takes arguments:

Input of shape batch × input dimension;
A tuple of LSTM hidden states of shape batch x hidden dimensions.

It is a straightforward implementation of the equations.

LSTM is a layer applying an LSTM cell (or multiple LSTM cells) in a “for loop”, but the loop is heavily optimized using cuDNN. Its input is

A three-dimensional tensor of inputs of shape batch × input length × input dimension;
Optionally, an initial state of the LSTM, i.e., a tuple of hidden states of shape batch × hidden dim (or tuple of such tuples if the LSTM is bidirectional)

You often might want to use the LSTM cell in a different context than apply it over a sequence, i.e. make an LSTM that operates over a tree-like structure. When you write a decoder in sequence-to-sequence models, you also call the cell in a loop and stop the loop when the end-of-sequence symbol is decoded.

Leave a Comment Cancel reply