Deep Q-learning
The Q-learning method that we have just covered solves the issue of iteration over the full set of states, but it can still struggle with situations when the count of the observable set of states is very large. For example, Atari games can have a large variety of different screens, so if we decide to use raw pixels as individual states, we will quickly realize that we have too many states to track and approximate values for.
In some environments, the count of different observable states could be almost infinite. For example, in CartPole, the environment gives us a state that is four floating point numbers. The number of value combinations is finite (they’re represented as bits), but this number is extremely large. With just bit values, it is around 24⋅32 ≈ 3.4 ⋅ 1038. In reality, it is less, as state values of the environment are bounded, so not all bit combinations of 4 float32 values are possible, but the resulting state space...