Now that the Flappy Bird environment is ready, we can start tackling it by building a DQN model.
As we have seen, a screen image is returned at each step after an action is taken. A CNN is one of the best neural network architectures to deal with image inputs. In a CNN, the convolutional layers are able to effectively extract features from images, which will be passed on to fully connected layers downstream. In our solution, we will use a CNN with three convolutional layers and one fully connected hidden layer. An example of CNN architecture is as follows: