How does the back-propagation algorithm deal with non-differentiable activation functions?

To understand how backpropagation is even possible with functions like ReLU you need to understand what is the most important property of derivative that makes backpropagation algorithm works so well. This property is that : f(x) ~ f(x0) + f'(x0)(x – x0) If you treat x0 as actual value of your parameter at the moment … Read more

What are forward and backward passes in neural networks?

The “forward pass” refers to calculation process, values of the output layers from the inputs data. It’s traversing through all neurons from first to last layer. A loss function is calculated from the output values. And then “backward pass” refers to process of counting changes in weights (de facto learning), using gradient descent algorithm (or … Read more

What is the difference between back-propagation and feed-forward Neural Network?

A Feed-Forward Neural Network is a type of Neural Network architecture where the connections are “fed forward”, i.e. do not form cycles (like in recurrent nets). The term “Feed forward” is also used when you input something at the input layer and it travels from input to hidden and from hidden to output layer. The … Read more

Understanding Neural Network Backpropagation

The tutorial you posted here is actually doing it wrong. I double checked it against Bishop’s two standard books and two of my working implementations. I will point out below where exactly. An important thing to keep in mind is that you are always searching for derivatives of the error function with respect to a … Read more

What is the difference between SGD and back-propagation?

Backpropagation is an efficient method of computing gradients in directed graphs of computations, such as neural networks. This is not a learning method, but rather a nice computational trick which is often used in learning methods. This is actually a simple implementation of chain rule of derivatives, which simply gives you the ability to compute … Read more

In which cases is the cross-entropy preferred over the mean squared error? [closed]

Cross-entropy is prefered for classification, while mean squared error is one of the best choices for regression. This comes directly from the statement of the problems itself – in classification you work with very particular set of possible output values thus MSE is badly defined (as it does not have this kind of knowledge thus … Read more

What does the parameter retain_graph mean in the Variable’s backward() method?

@cleros is pretty on the point about the use of retain_graph=True. In essence, it will retain any necessary information to calculate a certain variable, so that we can do backward pass on it. An illustrative example Suppose that we have a computation graph shown above. The variable d and e is the output, and a … Read more