Q-learning is an algorithm that is designed to solve a control problem called a Markov decision process (MDP). We will go over what MDPs are in detail, how they work, and how Q-learning is designed to solve them. We will explore some classic reinforcement learning (RL) problems and learn how to develop solutions using Q-learning.
We will cover the following topics in this chapter:
- Understanding what an MDP is and how Q-learning is designed to solve an MDP
- Learning how to define the states an agent can be in, and the actions it can take from those states in the context of the OpenAI Gym Taxi-v2 environment that we will be using for our first project
- Becoming familiar with alpha (learning), gamma (discount), and epsilon (exploration) rates
- Diving into a classic RL problem, the multi-armed bandit problem (MABP), and putting it into a...