Chapter 4: Makings of a Markov Decision Process
In the first chapter, we talked about many applications of Reinforcement Learning (RL), from robotics to finance. Before implementing any RL algorithms for such applications, we need to first model them mathematically. Markov Decision Process (MDP) is the framework we use to model such sequential decision-making problems. MDPs have some special characteristics that make it easier for us to theoretically analyze them. Building on that theory, Dynamic Programming (DP) is the field that proposes solution methods for MDPs. RL, in some sense, is a collection of approximate DP approaches, which enable us to obtain good (but not necessarily optimal) solutions to very complex problems that are intractable to solve with exact DP methods.
In this chapter we step-by-step build the MDP, explain its characteristics, and lay down the mathematical foundation for the RL algorithms upcoming in the later chapters. In an MDP, the actions an agent takes...