The softmax+logits simply means that the function operates on the unscaled output of earlier layers and that the relative scale to understand the units is linear. It means, in particular, the sum of the inputs may not equal 1, that the values are *not* probabilities (you might have an input of 5). Internally, it first applies softmax to the unscaled output, and then and then computes the cross entropy of those values vs. what they “should” be as defined by the labels.

`tf.nn.softmax`

produces the result of applying the softmax function to an input tensor. The softmax “squishes” the inputs so that `sum(input) = 1`

, and it does the mapping by interpreting the inputs as log-probabilities (logits) and then converting them back into raw probabilities between 0 and 1. The shape of output of a softmax is the same as the input:

```
a = tf.constant(np.array([[.1, .3, .5, .9]]))
print s.run(tf.nn.softmax(a))
[[ 0.16838508 0.205666 0.25120102 0.37474789]]
```

See this answer for more about why softmax is used extensively in DNNs.

`tf.nn.softmax_cross_entropy_with_logits`

combines the softmax step with the calculation of the cross-entropy loss after applying the softmax function, but it does it all together in a more mathematically careful way. It’s similar to the result of:

```
sm = tf.nn.softmax(x)
ce = cross_entropy(sm)
```

The cross entropy is a summary metric: it sums across the elements. The output of `tf.nn.softmax_cross_entropy_with_logits`

on a shape `[2,5]`

tensor is of shape `[2,1]`

(the first dimension is treated as the batch).

If you want to do optimization to minimize the cross entropy **AND** you’re softmaxing after your last layer, you should use `tf.nn.softmax_cross_entropy_with_logits`

instead of doing it yourself, because it covers numerically unstable corner cases in the mathematically right way. Otherwise, you’ll end up hacking it by adding little epsilons here and there.

**Edited 2016-02-07:**

If you have single-class labels, where an object can only belong to one class, you might now consider using `tf.nn.sparse_softmax_cross_entropy_with_logits`

so that you don’t have to convert your labels to a dense one-hot array. This function was added after release 0.6.0.