Let's look at the results we see from running this DQN:
We're definitely making some progress here. We're able to score some points, and the further we go, the higher our score goes, even if the progress is slow. But it still doesn't seem like we're getting consistently closer to solving the task. Our average score isn't climbing high enough to reach the required level.
One issue we might be experiencing is noise in our model. Because there are so many states in our model and so much potential feedback, we might be receiving noisy feedback that's slowing down our model's ability to generalize from the data. Remember that we've chosen a low alpha value to try to cut down on overfitting and too much learning from noise.
What changes can we make now to improve our performance? We can tune the hyperparameters to see...