How does TensorFlow SparseCategoricalCrossentropy work?

Question

SparseCategoricalCrossentropy and CategoricalCrossentropy both compute categorical cross-entropy. The only difference is in how the targets/labels should be encoded.

When using SparseCategoricalCrossentropy the targets are represented by the index of the category (starting from 0). Your outputs have shape 4×2, which means you have two categories. Therefore, the targets should be a 4 dimensional vector with entries that are either 0 or 1. For example:

scce = tf.keras.losses.SparseCategoricalCrossentropy();
Loss = scce(
  tf.constant([ 0,    0,    0,    1   ], tf.float32),
  tf.constant([[1,2],[3,4],[5,6],[7,8]], tf.float32))

This in contrast to CategoricalCrossentropy where the labels should be one-hot encoded:

cce = tf.keras.losses.CategoricalCrossentropy();
Loss = cce(
  tf.constant([ [1,0]    [1,0],    [1, 0],   [0, 1]   ], tf.float32),
  tf.constant([[1,2],[3,4],[5,6],[7,8]], tf.float32))

SparseCategoricalCrossentropy is more efficient when you have a lot of categories.

Leave a Comment Cancel reply