As we mentioned before, we can also use neural networks as the approximating function. In this recipe, we will solve theMountain Car environment using Q-learning with neural networks for approximation.
The goal of FA is to use a set of features to estimate the Q values via a regression model. Using neural networks as the estimation model, we increase the regression power by adding flexibility (multiple layers in neural networks) and non-linearity introduced by non-linear activation in hidden layers. The remaining part of the Q-learning model is very similar to the one with linear approximation. We also use gradient descent to train the network. The ultimate goal of learning is to find the optimal weights of the network to best approximate the state-value function, V(s), for each possible action. The loss function...