Understanding Adam Optimization
Before we look at Adam optimization, let's try to first understand the concept of gradient descent.
Gradient descent is an iterative optimization algorithm to find the minimum of a function. An analogous example could be as follows: let's say we are stuck on somewhere in middle of a mountain and we want to reach the ground in fastest possible manner. As a first step, we will observe the slope of mountain in all directions around us and decide to take the the direction with steepest slope down.
We re-evaluate our choice of direction after every step we take. Also, the size of our walking also depends on the steepness of the downward slope. If the slope is very steep, we take bigger steps as it can help us to reach faster to the ground. This way after a few/large number of steps we can reach the ground safely. Similarly, in machine learning, we want to minimize some error/cost function by updating the weights of the algorithm. To find minimum of cost function...