The theoretical foundations of RL
Before we jump into some practical examples and start training an RL model, which we will be doing later in this chapter, let’s first understand some of the theoretical foundations of RL. The following sections will begin by first examining the mathematical formulation of Markov decision processes, episodic versus continuing tasks, some key RL terminology, and dynamic programming using the Bellman equation. Let’s start with Markov decision processes.
Markov decision processes
In general, the type of problems that RL deals with are typically formulated as Markov decision processes (MDPs). The standard approach for solving MDP problems is by using dynamic programming, but RL offers some key advantages over dynamic programming.
Dynamic programming
Dynamic programming refers to a set of computer algorithms and programming methods that was developed by Richard Bellman in the 1950s. In a sense, dynamic programming...