In this chapter, we've created, trained, and tested our first Q-learning agent. We've seen how a version of the Bellman equation works and we translated it into Python using an argmax function to calculate the Q-value of a state-action pair.
We trained and tested our learning agent against our random agent and compared their performances. We saw that the longer the learning agent is trained, the more it learns about its environment and the better it performs at finding an optimal solution.
In the next chapter, we'll explore problems where the state space is too complex to use a Q-table. We'll use neural networks, and, later, deep learning structures called deep Q-networks, to approximate Q-values. We'll also explore several different Python packages used for building neural networks and compare the merits of each one.