Member-only story
What’s Happening in Back-Propagation
Training the Black Box
The previous article was all about forward propagation in neural networks, how it works and why it works. One of the important entities in forward propagation is weights. We saw how tuning the weights can take advantage of the non-linearity introduced in each layer to leverage the resultant output. As we said we are going to randomly initialize the weights and biases and let the network learn these weights over time. Now comes the most important question. how are these weights going to be updated? how are the right weights and biases, that optimize the performance of the network to approximate the original relationship between x and y, going to be learned?
A peek into Gradient Descent
Before we go further, I hope you are familiar with Gradient Descent. If not, Let’s have a quick peek. Gradient Descent is an algorithm that is used for the minimization of a function with no local minima.
Let us consider an example function as shown in fig 1.1. This is nothing but y = (x + 2)². We know, just by looking at this function that the value of y will be minimum when x = -2. But, is there any…