Deep RL for trading with the OpenAI Gym
In the previous section, we saw how Q-learning allows us to learn the optimal state-action value function q* in an environment with discrete states and discrete actions using iterative updates based on the Bellman equation.
In this section, we will take RL one step closer to the real world and upgrade the algorithm to continuous states (while keeping actions discrete). This implies that we can no longer use a tabular solution that simply fills an array with state-action values. Instead, we will see how to approximate q* using a neural network (NN), which results in a deep Q-network. We will first discuss how deep learning integrates with RL before presenting the deep Q-learning algorithm, as well as various refinements that accelerate its convergence and make it more robust.
Continuous states also imply a more complex environment. We will demonstrate how to work with OpenAI Gym, a toolkit for designing and comparing RL algorithms. First...