In the previous section, we saw how Q-learning allows us to learn the optimal state-action value function, q*, in an environment with discrete states actions using iterative updates based on the Bellman equation.
In this section, we will adapt the algorithm to continuous states and actions where we cannot use the tabular solution that simply fills an array with state-action values. Instead, we will see how to approximate q* using a neural network to build a deep Q network with various refinements to accelerate convergence. We will then see how we can use the OpenAI Gym to apply the algorithm to the Lunar Lander environment.