Implementing deep Q-learning with the fixed targets model
In the previous section, we learned how to leverage deep Q-learning to solve the CartPole environment in Gym. In this section, we will work on a more complicated game of Pong and understand how deep Q-learning, alongside the fixed targets model, can solve the game. While working on this use case, you will also learn how to leverage a CNN-based model (in place of the vanilla neural network we used in the previous section) to solve the problem. The theory from the previous section remains largely the same, with one crucial change, a “fixed target model.” Essentially, we create a copy of the local model and use that as our guide for our local model at every 1,000 steps, along with the local model’s rewards for those 1,000 steps. This makes the local model more grounded and updates its weights more smoothly. After the 1,000 steps, we update the target model with the local model to update the overall learnings...