What does Keras.io.preprocessing.sequence.pad_sequences do?

pad_sequences is used to ensure that all sequences in a list have the same length. By default this is done by padding 0 in the beginning of each sequence until each sequence has the same length as the longest sequence.

For example

>>> pad_sequences([[1, 2, 3], [3, 4, 5, 6], [7, 8]])
array([[0, 1, 2, 3],
       [3, 4, 5, 6],
       [0, 0, 7, 8]], dtype=int32)

[3, 4, 5, 6] is the longest sequence, so 0 will be padded to the other sequences so their length matches [3, 4, 5, 6].

If you rather want to pad to the end of the sequences you can set padding='post'.

If you want to specify the maximum length of each sequence you can use the maxlen argument. This will truncate all sequences longer than maxlen.

>>> pad_sequences([[1, 2, 3], [3, 4, 5, 6], [7, 8]], maxlen=3)
array([[1, 2, 3],
       [4, 5, 6],
       [0, 7, 8]], dtype=int32)

Now each sequence have the length 3 instead.

According to the documentation one can control the truncation with the pad_sequences. By default truncating is set to pre, which truncates the beginning part of the sequence. If you rather want to truncate the end part of the sequence you can set it to post.

Leave a Comment