In this recipe, we will play Blackjack (also called 21) and evaluate a policy we think might work well. You will get more familiar with Monte Carlo prediction with the Blackjack example, and get ready to search for the optimal policy using Monte Carlo control in the upcoming recipes.
Blackjack is a popular card game where the goal is to have the sum of cards as close to 21 as possible without exceeding it. The J, K, and Q cards have a points value of 10, and cards from 2 to 10 have values from 2 to 10. The ace card can be either 1 or 11 points; when the latter value is chosen, it is called a usable ace. The player competes against a dealer. At the beginning, both parties are given two random cards, but only one of the dealer's cards is revealed to the player. The player can request additional cards (called hit) or stop receiving...