As you will learn, a DQN is not that different from the standard feedforward and convolutional networks that we have covered so far. Indeed, all the standard ingredients are present:
- A representation of our data (in this example, the state of our maze and the agent trying to navigate through it)
- Standard layers to process a representation of our maze, which also includes standard operations between these layers, such as the Tanh activation function
- An output layer with a linear activation, which gives you predictions
Here, our predictions represent possible moves affecting the state of our input. In the case of maze solving, we are trying to predict moves that produce the maximum (and cumulative) expected reward for our player, which ultimately leads to the maze's exit. These predictions occur as part of a training loop, where the learning algorithm uses...