Performing Monte Carlo learning
Monte Carlo (MC)-based reinforcement learning is a model-free approach, which means it doesn’t need a known transition matrix and reward matrix. In this section, you will learn about MC policy evaluation on Gymnasium’s Blackjack environment, and solve the environment with MC control algorithms. Blackjack is a typical environment with an unknown transition matrix. Let’s first simulate the Blackjack environment.
Simulating the Blackjack environment
Blackjack is a popular card game. The game has the following rules:
- The player competes against a dealer and wins if the total value of their cards is higher than the dealer’s and doesn’t exceed 21.
- Cards from 2 to 10 have values from 2 to 10.
- Cards J, K, and Q have a value of 10.
- The value of an ace can be either 1 or 11 (called a “usable” ace).
- At the beginning, both parties are given two random cards, but only one...