Summary
In this last practical chapter of the book, we built a deep convolutional Q-Learning model for Snake. Before we built anything, we had to define what our AI would see. We established that we needed to stack multiple frames, so that our AI would see the continuity of its moves. This was the input to our Convolutional Neural Network. The outputs were the Q-values corresponding to each of the four possible moves: going up, going down, going left, and going right. We rewarded our AI for eating an apple, punished it for losing, and punished it slightly for performing any action (the living penalty). Having run 25,000 games, we can see that our AI is able to eat 10-11 apples per game.
I hope you enjoyed it!