Python for Game AI
An AI game player is nothing but an intelligent agent with a clear goal: to win the game and defeat all the other players. AI experiments have achieved surprising results when it comes to games. Today, no human can defeat an AI in the game of chess.
The game Go was the last game where human players could consistently defeat a computer player. However, in 2017, Google's game-playing AI called AlphaGo defeated the world number 1 ranked Go player.
Intelligent Agents in Games
An intelligent agent plays according to the rules of the game. The agent can sense the current state of the game through its sensors and can evaluate the potential steps. Once the agent finds the best possible step, it performs the action using its actuators. The agent finds the best possible action to reach the goal based on the information it has. Actions are either rewarded or punished. The carrot and stick are excellent examples of rewards and punishment. Imagine a donkey in front of your cart. You put a carrot in front of the eyes of the donkey, so the animal starts walking toward it. As soon as the donkey stops, the rider may apply punishment with a stick. This is not a human way of moving, but rewards and punishment control living organisms to some extent. The same happens to humans at school, at work, and in everyday life as well. Instead of carrots and sticks, we have income and legal punishment to shape our behavior.
In most games, a good sequence of actions results in a reward. When a human player feels rewarded, that makes the human feel happy. Humans tend to act in a way that maximizes their happiness. Intelligent agents, on the other hand, are only interested in their goal, which is to maximize their reward and minimize the punishment that's affecting their performance score.
When modeling games, we must determine their state space. An action causes a state transition. When we explore the consequences of all possible actions, we get a decision tree. This tree goes deeper as we start exploring the possible future actions of all players until the game ends.
The strength of AI is the execution of millions of possible steps each second. Therefore, game AI often boils down to a search exercise. When exploring all of the possible sequences of moves in a game, we get the state tree of a game.
Consider a chess AI. What is the problem with evaluating all possible moves by building a state tree consisting of all of the possible sequences of moves?
Chess is an EXPTIME game complexity-wise. The number of possible moves explodes combinatorically.
White starts with 20 possible moves: the eight pawns may move either one or two steps, and the two knights may move either up-up-left, or up-up-right. Then, black can make any of these 20 moves. There are already 20*20 = 400 possible combinations after just one move per player.
After the second move, we get 8,902 possible board constellations, and this number just keeps on growing. Just take seven moves, and you have to search through 10,921,506 possible constellations.
The average length of a chess game is approximately 40 moves. Some exceptional games take more than 200 moves to finish.
As a consequence, the computer player simply does not have time to explore the whole state space. Therefore, the search activity has to be guided with proper rewards, punishment, and simplifications of the rules.
Breadth First Search and Depth First Search
Creating a game AI is often a search exercise. Therefore, we need to be familiar with the two primary search techniques:
- Breadth First Search (BFS)
- Depth First Search (DFS)
These search techniques are applied on a directed rooted tree.
A tree is a data structure that has nodes, and edges connecting these nodes in such a way that any two nodes of the tree are connected by exactly one path. Have a look at the following figure:
When the tree is rooted, there is a special node in the tree called the root, which is where we begin our traversal. A directed tree is a tree where the edges may only be traversed in one direction. Nodes may be internal nodes or leaves. Internal nodes have at least one edge, through which we can leave the node. A leaf has no edges pointing out from the node.
In AI search, the root of the tree is the starting state. We traverse from this state by generating the successor nodes of the search tree. Search techniques differ, depending on the order in which we visit these successor nodes.
Breadth First Search (BFS)
BFS is a search technique that, starting from the root node (node 1), will start exploring the closest node on the same depth (or level) before moving to the next depth:
In the preceding figure, you can see the search order of the BFS technique. Starting from the root node (1
), BFS will go to the next level and explore the closest node (2
) before looking at the other nodes on the same level (3
and 4
). Then, it will move to the next level and explore 5
and 6
as they are close to each other before going back through to node 3
, finishing on the last node (7
), and so on.
Depth First Search (DFS)
DFS is a search technique that, starting from the root node (node 1), will start exploring the same branch as much as possible before moving to the next closest branch:
In the preceding figure, you can see the search order of the DFS technique. Starting from the root node (1
), DFS will go to the closest node (2
) and explore all the way to the end of the branch (3
) on the third depth before going back to the node (2
) and finish by exploring its second branch (4
). Then, it will move back to the second depth and start the same process with the next branch (6
) before finishing with the last branch (7
).
Now, suppose we have a tree defined by its root and a function that generates all the successor nodes from the root. In the following example, each node has a value and a depth. We start from 1
and may either increase the value by 1
or 2
. Our goal is to reach the value 5
:
Note
The code snippet shown here uses a backslash ( \
) to split the logic across multiple lines. When the code is executed, Python will ignore the backslash, and treat the code on the next line as a direct continuation of the current line.
root = {'value': 1, 'depth': 1} def succ(node): Â Â Â Â if node['value'] == 5: Â Â Â Â Â Â Â Â return [] Â Â Â Â elif node['value'] == 4: Â Â Â Â Â Â Â Â return [{'value': 5,'depth': node['depth']+1}] Â Â Â Â else: Â Â Â Â Â Â Â Â return [{'value': node['value']+1, \ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 'depth':node['depth']+1}, \ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â {'value': node['value']+2, \ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 'depth':node['depth']+1}]
In the preceding code snippet, we have initialized the root node as having a value and depth of 1
. Then, we created a function called succ
that takes a node as input. This function will have 3 different cases:
- If the input node value is
5
, then return nothing as we will have already reached our goal (5
). - If the input node value is
4
, then return the value5
and add1
to the current depth. - If the value is anything else, then add
1
to the depth and create two cases for the value,+1
and+2
.
First, we will perform BFS, as shown here:
def bfs_tree(node): Â Â Â Â nodes_to_visit = [node] Â Â Â Â visited_nodes = [] Â Â Â Â while len(nodes_to_visit) > 0: Â Â Â Â Â Â Â Â current_node = nodes_to_visit.pop(0) Â Â Â Â Â Â Â Â visited_nodes.append(current_node) Â Â Â Â Â Â Â Â nodes_to_visit.extend(succ(current_node)) Â Â Â Â return visited_nodes bfs_tree(root)
In the preceding code snippet, we have implemented the bfs_tree
function by taking a node as input. This function can be broken down into three parts:
The first part is initializing the nodes_to_visit
and visited_nodes
variables.
The second part is where BFS is implemented:
- The
current_node
variable takes away the first element of thenodes_to_visit
variable. - The
visited_nodes
variable adds this element to its list. - The
nodes_to_visit
variable adds the newly generated nodes from the call ofsucc
with thecurrent_node
as input to it.
The preceding three instructions are wrapped into a loop defined by the number of elements inside the nodes_to_visit
variable. As long as nodes_to_visit
has at least one element, then the loop will keep going.
The third part, which is the end of the function, will return the entire list of values from the visited_nodes
variable.
The expected output is this:
[{'value': 1, 'depth': 1}, {'value': 2, 'depth': 2}, {'value': 3, 'depth': 2}, {'value': 3, 'depth': 3}, {'value': 4, 'depth': 3}, {'value': 4, 'depth': 3}, {'value': 5, 'depth': 3}, {'value': 4, 'depth': 4}, {'value': 5, 'depth': 4}, {'value': 5, 'depth': 4}, {'value': 5, 'depth': 4}, {'value': 5, 'depth': 5}]
As you can see, BFS is searching through the values of the same depth before moving to the next level of depth and exploring the values of it. Notice how depth and value are increasing in sequence. This will not be the case in DFS.
If we had to traverse a graph instead of a directed rooted tree, BFS would look different: whenever we visit a node, we would have to check whether the node had been visited before. If the node had been visited before, we would simply ignore it.
In this chapter, we will only use Breadth First Traversal on trees. DFS is surprisingly similar to BFS. The difference between DFS and BFS is the sequence in which you access the nodes. While BFS visits all the children of a node before visiting any other nodes, DFS digs deep into the tree.
Have a look at the following example, where we're implementing DFS:
def dfs_tree(node): Â Â Â Â nodes_to_visit = [node] Â Â Â Â visited_nodes = [] Â Â Â Â while len(nodes_to_visit) > 0: Â Â Â Â Â Â Â Â current_node = nodes_to_visit.pop() Â Â Â Â Â Â Â Â visited_nodes.append(current_node) Â Â Â Â Â Â Â Â nodes_to_visit.extend(succ(current_node)) Â Â Â Â return visited_nodes dfs_tree(root)
In the preceding code snippet, we have implemented the dfs_tree
function by taking a node as input. This function can be broken down into three parts:
The first part is initializing the nodes_to_visit
and visited_nodes
variables.
The second part is where DFS is implemented:
- The
current_node
variable takes away the last element of thenodes_to_visit
variable. - The
visited_nodes
variable adds this element to its list. - The
nodes_to_visit
variable adds the newly generated nodes from the call ofsucc
withcurrent_node
as input to it.
The preceding three instructions are wrapped into a loop defined by the number of elements inside the nodes_to_visit
variable. As long as nodes_to_visit
has at least one element, then the loop will keep going.
At the end, that is at the third part, the function will return the entire list of values from visited_nodes
.
As you can see, the main difference between BFS and DFS is the order in which we took an element out of nodes_to_visit
. For BFS, we take the first element, whereas for DFS, we take the last one.
The expected output is this:
[{'value': 1, 'depth': 1}, {'value': 3, 'depth': 2}, {'value': 5, 'depth': 3}, {'value': 4, 'depth': 3}, {'value': 5, 'depth': 4}, {'value': 2, 'depth': 2}, {'value': 4, 'depth': 3}, {'value': 5, 'depth': 4}, {'value': 3, 'depth': 3}, {'value': 5, 'depth': 4}, {'value': 4, 'depth': 4}, {'value': 5, 'depth': 5}]
Notice how the DFS algorithm digs deep fast (the depth reaches higher values faster than BFS). It does not necessarily find the shortest path first, but it is guaranteed to find a leaf before exploring a second path.
In game AI, the BFS algorithm is often better for the evaluation of game states because DFS may get lost. Imagine starting a chess game, where a DFS algorithm may easily get lost in exploring the options for a move.
Exploring the State Space of a Game
Let's explore the state space of a simple game: tic-tac-toe. A state space is the set of all possible configurations of a game, which, in this case, means all the possible moves.
In tic-tac-toe, a 3x3 game board is given. Two players play this game. One plays with the sign X, while the other plays with the sign O. X starts the game, and each player makes a move after the other. The goal of the game is to get three of your own signs horizontally, vertically, or diagonally.
Let's denote the cells of the tic-tac-toe board, as follows:
In the following example, X
started at position 1
. O
retaliated at position 5
, X
made a move at position 9
, and then O
moved to position 3
:
This was a mistake by the second player, because now X
is forced to place a sign on cell 7
, creating two future scenarios for winning the game. It does not matter whether O
defends by moving to cell 4
or 8
– X
will win the game by selecting the other unoccupied cell.
Note
You can try out the game at http://www.half-real.net/tictactoe/.
For simplicity, we will only explore the state space belonging to the cases when the AI player starts. We will start with an AI player who plays randomly, placing a sign in an empty cell. After playing with this AI player, we will create a complete decision tree. Once we generate all possible game states, you will experience their combinatorial explosion. As our goal is to make these complexities simple, we will use several different techniques to make the AI player smarter and reduce the size of the decision tree.
By the end of this experiment, we will have a decision tree that has fewer than 200 different game endings and, as a bonus, the AI player will never lose a single game.
To make a random move, you will have to know how to choose a random element from a list using Python. We will use the choice
function of the random
library to do so:
from random import choice choice([2, 4, 6, 8])
This time, the output is 6
, but for you, it can be any number from the list.
The output of the choice
function is a random element of the list.
Note
We will use the factorial notation in the following section. A factorial is denoted by the "!" exclamation mark. By definition, 0! = 1, and n! = n*(n-1)!. In our example, 9! = 9* 8! = 9*8*7! = … = 9*8*7*6*5*4*3*2*1.
Estimating the Number of Possible States in a Tic-Tac-Toe Game
Make a rough estimate of the number of possible states on each level of the state space of the tic-tac-toe game:
- In our estimation, we will not stop until all of the cells of the board have been filled. A player might win before the game ends, but for the sake of uniformity, we will continue the game.
- The first player will choose one of the nine cells. The second player will choose one out of the eight remaining cells. The first player can then choose one out of the seven remaining cells. This goes on until either player wins the game, or the first player is forced to make the ninth and last move.
- The number of possible decision sequences is, therefore, 9! = 362,880. A few of these sequences are invalid because a player may win the game in fewer than nine moves. It takes at least five moves to win a game because the first player needs to move three times.
- To calculate the exact size of the state space, we have to calculate the number of games that are won in five, six, seven, and eight steps. This calculation is simple, but due to its brute-force nature, it is beyond our scope. Therefore, we will settle on the magnitude of the state space.
Note
After generating all possible tic-tac-toe games, researchers counted 255,168 possible games. Out of those games, 131,184 were won by the first player, 77,904 were won by the second player, and 46,080 games ended with a draw. Visit http://www.half-real.net/tictactoe/allgamesoftictactoe.zip to download all possible tic-tac-toe games.
Even a simple game such as tic-tac-toe has a lot of states. Just imagine how hard it would be to start exploring all possible chess games. Therefore, we can conclude that brute-force searching is rarely ideal.
Exercise 1.02: Creating an AI with Random Behavior for the Tic-Tac-Toe Game
In this exercise, we'll create a framework for the tic-tac-toe game for experimentation. We will be modeling the game on the assumption that the AI player always starts the game. You will create a function that prints your internal representation, allows your opponent to enter a move randomly, and determines whether a player has won.
Note
To ensure that this happens correctly, you will need to have completed the previous exercise.
The following steps will help you to complete this exercise:
- Begin by opening a new Jupyter Notebook file.
- We will import the
choice
function from therandom
library:from random import choice
- Now, model the nine cells in a simple string for simplicity. A nine-character long Python string stores these cells in the following order: "
123456789
". Let's determine the index triples that must contain matching signs so that a player wins the game:combo_indices = [[0, 1, 2], [3, 4, 5], [6, 7, 8], [0, 3, 6], \ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â [1, 4, 7], [2, 5, 8], [0, 4, 8], [2, 4, 6]]
- Define the sign constants for empty cells, the AI, and the opponent player:
EMPTY_SIGN = '.' AI_SIGN = 'X' OPPONENT_SIGN = 'O'
In the preceding code snippet, we have assigned a different sign for the AI and the player.
- Create a function that prints a board. We will add an empty row before and after the board so that we can easily read the game state:
def print_board(board): Â Â Â Â print(" ") Â Â Â Â print(' '.join(board[:3])) Â Â Â Â print(' '.join(board[3:6])) Â Â Â Â print(' '.join(board[6:])) Â Â Â Â print(" ")
- Describe a move of the human player. The input arguments are the boards, the row numbers from
1
to3
, and the column numbers from1
to3
. The return value of this function is a board containing the new move:def opponent_move(board, row, column): Â Â Â Â index = 3 * (row - 1) + (column - 1) Â Â Â Â if board[index] == EMPTY_SIGN: Â Â Â Â Â Â Â Â return board[:index] + OPPONENT_SIGN + board[index+1:] Â Â Â Â return board
Here, we have defined a function called
opponent_move
that will help us to calculate the index of the board based on the input (row and column). You will be able to see the resulting position on the board. - Now, we need to define a random move on the part of the AI player. We will generate all possible moves with the
all_moves_from_board
function, and then select a random move from the list of possible moves:def all_moves_from_board(board, sign): Â Â Â Â move_list = [] Â Â Â Â for i, v in enumerate(board): Â Â Â Â Â Â Â Â if v == EMPTY_SIGN: Â Â Â Â Â Â Â Â Â Â Â Â move_list.append(board[:i] + sign + board[i+1:]) Â Â Â Â return move_list def ai_move(board): Â Â Â Â return choice(all_moves_from_board(board, AI_SIGN))
In the preceding code snippet, we defined a function called
all_moves_from_board
that goes through all the indexes on the board and checks whether they are empty (v == EMPTY_SIGN
). If that's the case, this means that the move can be played and that the index has been added to a list of moves (move_list
). Finally, we defined theai_move
function in order to randomly let the AI choose an index that is equal to a move in the game. - Determine whether a player has won the game:
def game_won_by(board): Â Â Â Â for index in combo_indices: Â Â Â Â Â Â Â Â if board[index[0]] == board[index[1]] == \ Â Â Â Â Â Â Â Â Â Â Â board[index[2]] != EMPTY_SIGN: Â Â Â Â Â Â Â Â Â Â Â Â return board[index[0]] Â Â Â Â return EMPTY_SIGN
In the preceding code snippet, we have defined the
game_won_by
function, which checks whether the board contains a combo of three identical indexes from thecombo_indices
variable to end the game. - Finally, create a game loop so that we can test the interaction between the computer player and the human player. We will conduct a brute-force search in the following examples:
def game_loop():     board = EMPTY_SIGN * 9     empty_cell_count = 9     is_game_ended = False     while empty_cell_count > 0 and not is_game_ended:         if empty_cell_count % 2 == 1:             board = ai_move(board)         else:             row = int(input('Enter row: '))             col = int(input('Enter column: '))             board = opponent_move(board, row, col)         print_board(board)         is_game_ended = game_won_by(board) != EMPTY_SIGN         empty_cell_count = sum(1 for cell in board \                                if cell == EMPTY_SIGN)     print('Game has been ended.')
In the preceding code snippet, we defined the function, which can be broken down into various parts.
The first part is to initialize the board and fill it with empty signs (
board = EMPTY_SIGN * 9
). Then, we create a counter of the empty cell, which will help us to create a loop and determine the AI's turn.The second part is to create a function for the player and the AI engine to play the game against each other. As soon as one player makes a move, the
empty_cell_count
variable will decrease by 1. The loop will keep going until either thegame_won_by
function finds a winner or there are no more possible moves on the board. - Use the
game_loop
function to run the game:game_loop()
The expected output (partially shown) is this:
Note
To access the source code for this specific section, please refer to https://packt.live/3fUws2l.
You can also run this example online at https://packt.live/3hVzjcT. You must execute the entire Notebook in order to get the desired result.
By completing this exercise, you have seen that even an opponent who is playing randomly may win from time to time if their opponent makes a mistake.
Activity 1.01: Generating All Possible Sequences of Steps in a Tic-Tac-Toe Game
This activity will explore the combinatorial explosion that is possible when two players play randomly. We will be using a program that, building on the previous results, generates all possible sequences of moves between a computer player and a human player.
Let's assume that the human player may make any possible move. In this example, given that the computer player is playing randomly, we will examine the wins, losses, and draws belonging to two randomly playing players:
The following steps will help you to complete this activity:
- Reuse all the function codes of Steps 2–9 from the previous Exercise 1.02, Creating an AI with Random Behavior for the Tic-Tac-Toe Game.
- Create a function that maps the
all_moves_from_board
function on each element of a list of board spaces/squares. This way, we will have all of the nodes of a decision tree. The decision tree starts with[ EMPTY_SIGN * 9 ]
and expands after each move. - Create a
filter_wins
function that takes finished games out of the list of moves and appends them in an array containing the board states won by the AI player and the opponent player. - Create a
count_possibilities
function that prints and returns the number of decision tree leaves that ended with a draw, were won by the first player, and were won by the second player: - We have up to nine steps in each state. In the 0th, 2nd, 4th, 6th, and 8th iterations, the AI player moves. In all other iterations, the opponent moves. We create all possible moves in all steps and take out finished games from the move list.
- Finally, execute the number of possibilities to experience the combinatorial explosion.
The expected output is this:
step 0. Moves: 1 step 1. Moves: 9 step 2. Moves: 72 step 3. Moves: 504 step 4. Moves: 3024 step 5. Moves: 13680 step 6. Moves: 49402 step 7. Moves: 111109 step 8. Moves: 156775 First player wins: 106279 Second player wins: 68644 Draw 91150 Total 266073
Note
The solution for this activity can be found via this link.
So far, we've understood the significance of an intelligent agent. We also examined the game states for a game AI. Now, we will focus on how to create and introduce intelligence to an agent.
We will look at reducing the number of states in the state space, analyze the stages that a game board can undergo, and make the environment work in such a way that we win.
Have a look at the following exercise, where we'll teach an intelligent agent to win.
Exercise 1.03: Teaching the Agent to Win
In this exercise, we will see how the steps needed to win can be reduced. We will be making the agent that we developed in the previous section activity detect situations where it can win a game.
The following steps will help you to complete this exercise:
- Open a new Jupyter Notebook file.
- Reuse the previous code from Steps 2–6 from Activity 1, Generating All Possible Sequences of Steps in a Tic-Tac-Toe Game.
- Define two functions,
ai_move
andall_moves_from_board
.We create
ai_move
so that it returns a move that will consider its own previous moves. If the game can be won in that move,ai_move
will select that move:def ai_move(board):     new_boards = all_moves_from_board(board, AI_SIGN)     for new_board in new_boards:         if game_won_by(new_board) == AI_SIGN:             return new_board     return choice(new_boards)
In the preceding code snippet, we have defined the
ai_move
function, which will make the AI choose a winning move from a list of all the possible moves from the current state of the game if it's applicable. If not, it will still choose a random move. - Next, test the code snippet with a game loop. Whenever the AI has the opportunity to win the game, it will always place the
X
in the correct cell:game_loop()
The expected output is this:
- Now, count all the possible moves. To do this, we must change the
all_moves_from_board
function to include this improvement. We must do this so that, if the game is won byAI_SIGN
, it will return that value:def all_moves_from_board(board, sign): Â Â Â Â move_list = [] Â Â Â Â for i, v in enumerate(board): Â Â Â Â Â Â Â Â if v == EMPTY_SIGN: Â Â Â Â Â Â Â Â Â Â Â Â new_board = board[:i] + sign + board[i+1:] Â Â Â Â Â Â Â Â Â Â Â Â move_list.append(new_board) Â Â Â Â Â Â Â Â Â Â Â Â if game_won_by(new_board) == AI_SIGN: Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â return [new_board] Â Â Â Â return move_list
In the preceding code snippet, we have defined a function to generate all possible moves. As soon as we find a move that wins the game for the AI, we return it. We do not care whether the AI has multiple options to win the game in one move – we just return the first possibility. If the AI cannot win, we return all possible moves. Let's see what this means in terms of counting all of the possibilities at each step.
- Enter the following function to find all the possibilities.
first_player, second_player, \ draw, total = count_possibilities()
The expected output is this:
step 0. Moves: 1 Â step 1. Moves: 9 Â step 2. Moves: 72 Â step 3. Moves: 504 Â step 4. Moves: 3024 Â step 5. Moves: 8525 Â step 6. Moves: 28612 Â step 7. Moves: 42187 Â step 8. Moves: 55888 Â First player wins: 32395 Â Second player wins: 23445 Â Draw 35544 Â Total 91384
Note
To access the source code for this specific section, please refer to https://packt.live/317pyTa.
You can also run this example online at https://packt.live/2YnLpDS. You must execute the entire Notebook in order to get the desired result.
With that, we have seen that the AI is still not winning most of the time. This means that we need to introduce more concepts to the AI to make it stronger. To teach the AI how to win, we need to teach it how to make defensive moves against losses.
Defending the AI against Losses
In the next activity, we will make the AI computer player play better compared to our previous exercise so that we can reduce the state space and the number of losses.
Activity 1.02: Teaching the Agent to Realize Situations When It Defends Against Losses
In this activity, we will force the computer to defend against a loss if the player puts their third sign in a row, column, or diagonal line:
- Reuse all the code from Steps 2–6 from the previous, Exercise 1.03, Teaching the Agent to Win.
- Create a function called
player_can_win
that takes all the moves from the board using theall_moves_from_board
function and iterates over it using a variable callednext_move
. On each iteration, it checks whether the game can be won by the sign, and then it returnstrue
orfalse
. - Extend the AI's move so that it prefers making safe moves. A move is safe if the opponent cannot win the game in the next step.
- Test the new application. You will find that the AI has made the correct move.
- Place this logic in the state space generator and check how well the computer player is doing by generating all possible games.
The expected output is this:
step 0. Moves: 1 step 1. Moves: 9 step 2. Moves: 72 step 3. Moves: 504 step 4. Moves: 3024 step 5. Moves: 5197 step 6. Moves: 18606 step 7. Moves: 19592 step 8. Moves: 30936 First player wins: 20843 Second player wins: 962 Draw 20243 Total 42048
Note
The solution for this activity can be found via this link.
Once we complete this activity, we notice that despite our efforts to make the AI better, it can still lose in 962 ways. We will eliminate all these losses in the next activity.
Activity 1.03: Fixing the First and Second Moves of the AI to Make It Invincible
In this activity, we will be combining our previous activities by teaching the AI how to recognize both a win and a loss so that it can focus on finding moves that are more useful than others. We will be reducing the possible games by hardcoding the first and second moves:
- Reuse the code from Steps 2–4 of the previous, Activity 1.02, Teaching the Agent to Realize Situations When It Defends Against Losses.
- Count the number of empty fields on the board and make a hardcoded move in case there are 9 or 7 empty fields. You can experiment with different hardcoded moves.
- Occupying any corner, and then occupying the opposite corner, leads to no losses. If the opponent occupies the opposite corner, making a move in the middle results in no losses.
- After fixing the first two steps, we only need to deal with 8 possibilities instead of 504. We also need to guide the AI into a state where the hardcoded rules are enough so that it never loses a game.
The expected output is this:
step 0. Moves: 1 step 1. Moves: 1 step 2. Moves: 8 step 3. Moves: 8 step 4. Moves: 48 step 5. Moves: 38 step 6. Moves: 108 step 7. Moves: 76 step 8. Moves: 90 First player wins: 128 Second player wins: 0 Draw 60 Total 188
Note
The solution for this activity can be found via this link.
Let's summarize the important techniques that we applied to reduce the state space so far:
- Empirical simplification: We accepted that the optimal first move is a corner move. We simply hardcoded a move instead of considering alternatives to focus on other aspects of the game. In more complex games, empirical moves are often misleading. The most famous chess AI victories often contain a violation of the common knowledge of chess grand masters.
- Symmetry: After we started with a corner move, we noticed that positions 1, 3, 7, and 9 were equivalent to the perspective of winning the game. Even though we didn't take this idea further, we noticed that we could even rotate the table to reduce the state space even further and consider all four corner moves as the exact same move.
- Reduction in different permutations leading to the same state: Suppose we can make the moves A or B and suppose our opponent makes move X, where X is not equal to either move A or B. If we explore the sequence A, X, B, and we start exploring the sequence B, X, then we don't have to consider the sequence B, X, A. This is because the two sequences lead to the exact same game state, and we have already explored a state containing these three moves before, that is, A, X, and B. The order of the sequence doesn't matter as it leads to the same result. This allows us to significantly reduce the number of possible moves.
- Forced moves for the player: When a player collects two signs horizontally, vertically, or diagonally, and the third cell in the row is empty, we are forced to occupy that empty cell either to win the game or to prevent the opponent from winning the game. Forced moves may imply other forced moves, which reduces the state space even further.
- Forced moves for the opponent: When a move from the opponent is clearly optimal, it does not make sense to consider scenarios where the opponent does not make the optimal move. When the opponent can win the game by occupying a cell, it does not matter whether we go on a long exploration of the cases when the opponent misses the optimal move. We save a lot less by not exploring cases when the opponent fails to prevent us from winning the game. This is because after the opponent makes a mistake, we will simply win the game.
- Random move: When we cannot decide and don't have the capacity to search, we move randomly. Random moves are almost always inferior to a search-based educated guess, but at times, we have no other choice.