We update the parameter of the model multiple times with our parameter update equation (1) until we find the optimal parameter value. In gradient descent, to perform a single parameter update, we iterate through all the data points in our training set. So, every time we update the parameters of the model, we iterate through all the data points in the training set. Updating the parameters of the model only after iterating through all the data points in the training set makes gradient descent very slow and it will increase the training time, especially when we have a large dataset.
Let's say we have a training set with 1 million data points. We know that we update the parameters of the model multiple times to find the optimal parameter value. So, even to perform a single parameter update, we go through all 1 million data points...