Not sure if this helps, but:
is really the indicator function
, as described here. This forms the expression
(j == y[i])
in the code.
Also, the gradient of the loss with respect to the weights is:
where
which is the origin of the X[:,i]
in the code.