In this chapter, we have discussed the basic assumption of reinforcement learning and how Q-learning works by showing the simple MDP problem. Reinforcement learning is a powerful technique for solving a situation where we do not have complete knowledge of the environment itself. This leads to the desired result with a few sets of definitions naturally modeled from the environment observation. While we still carefully design the transition function between states, the deterministic transition also provides a good assumption of MDP as shown in our experiment.
Q-learning is a widely used algorithm to resolve the reinforcement learning problem. It is an iterative process to update the action-value function according to the Bellman equation. It is guaranteed to be converged, and gives us a result consistent with our expectations. While the algorithm itself looks pretty simple...