How does the back-propagation algorithm deal with non-differentiable activation functions?
To understand how backpropagation is even possible with functions like ReLU you need to understand what is the most important property of derivative that makes backpropagation algorithm works so well. This property is that : f(x) ~ f(x0) + f'(x0)(x – x0) If you treat x0 as actual value of your parameter at the moment … Read more