The theoretical foundations of RL
In this section, I will introduce you to the mathematical representation and notation of the formalisms (reward, agent, actions, observations, and environment) that we just discussed. Then, using this as a knowledge base, we will explore the second-order notions of the RL language, including state, episode, history, value, and gain, which will be used repeatedly to describe different methods later in the book.
Markov decision processes
Before that, we will cover Markov decision processes (MDPs), which will bedescribed like a Russian matryoshka doll: we will start from the simplest case of a Markov process (MP), then extend that with rewards, which will turn it into a Markov reward process (MRP). Then, we will put this idea into an extra envelope by adding actions, which will lead us to an MDP.
MPs and MDPs are widely used in computer science and other engineering fields. So, reading this chapter will be useful for...