Why Bert transformer uses [CLS] token for classification instead of average over all tokens? August 2, 2023 by Tarik