Extremely small or NaN values appear in training neural network
Do you know about “vanishing” and “exploding” gradients in backpropagation? I’m not too familiar with Haskell so I can’t easily see what exactly your backprop is doing, but it does look like you are using a logistic curve as your activation function. If you look at the plot of this function you’ll see that the … Read more