Summary
In this chapter, we have covered the mathematical framework in which we model the sequential decision-making problems we face in real-life: Markov decision processes. To this end, we started with Markov chains, which do not involve any concept of reward or decision making. Markov chains simply describe stochastic processes where the system transitions based on the current state and independent of the previously visited states. We then added the notion of a reward and started discussing things like which states are more advantageous to be in in terms of the expected future rewards. This created a concept of a "value" for a state. Then, we finally brought in the concept of "decision/action" and defined the Markov decision process. Subsequently, we finalized the definitions of state-value functions and action-value functions. Lastly, we discussed what a partially observable environment is and how it affects the decision-making of an agent.
The Bellman equation...