In this section, we will understand the way in which we can take the right action for a simulated game. Note that this exercise will primarily help you to grasp how reinforcement learning works.
The optimal action to take in a simulated game with a non-negative reward
Getting ready
Let's define the environment we are operating in this simulated setting.
You have three boxes, on which two players are playing a game. Player 1 marks a box with 1 and player 2 marks one with 2. The player who is able to mark two consecutive boxes wins.
The empty board for this game looks as follows:

For the problem we just defined, only player 1 has an opportunity to win the game. The possible scenarios in which player 1 wins are either...