Summary
We started the chapter by setting up the working environment. After that, we studied the fundamentals of reinforcement learning along with a few examples. After exploring the FrozenLake environment, we solved it with two dynamic programming algorithms, value iteration and policy iteration. We talked about Monte Carlo learning and used it for value approximation and control in the Blackjack environment. Lastly, we developed the Q-learning algorithm and solved the Taxi problem.