Heuristics
In this section, we will formalize informed search techniques by defining and applying heuristics to guide our search. We will be looking at heuristics and creating them in the sections ahead.
Uninformed and Informed Searches
In the tic-tac-toe example, we implemented a greedy algorithm that first focused on winning, and then focused on not losing. When it comes to winning the game immediately, the greedy algorithm is optimal because there is never a better step than winning the game. When it comes to not losing, it matters how we avoid the loss. Our algorithm simply choses a random safe move without considering how many winning opportunities we have created.
BFS and DFS are part of uninformed searching because they consider all possible states in the game, which can be very time-consuming. On the other hand, heuristic informed searches will explore the space of available states intelligently in order to reach the goal faster.
Creating Heuristics
If we want to make better decisions, we apply heuristics to guide the search in the right direction by considering long-term benefits. This way, we can make a more informed decision in the present based on what could happen in the future. This can also help us solve problems faster.
We can construct heuristics as follows:
- In terms of the utility of making a move in the game
- In terms of the utility of a given game state from the perspective of a player
- In terms of the distance from our goal
Heuristics are functions that evaluate a game state or a transition to a new game state based on their utility. Heuristics are the cornerstones of making a search problem informed.
In this book, we will use utility and cost as negated terms. Maximizing utility and minimizing the cost of a move are considered synonyms.
A commonly used example of a heuristic evaluation function occurs in pathfinding problems. Suppose we are looking to reach a destination or a goal. Each step has an associated cost symbolizing the travel distance. Our goal is to minimize the cost of reaching the destination or goal (minimizing the travel distance).
One example of heuristic evaluation for solving this pathfinding problem will be to take the coordinates between the current state (position) and the goal (destination) and calculate the distance between these two points. The distance between two points is the length of the straight line connecting the points. This heuristic is called the Euclidean distance (as shown in the Figure 1.10).
Now, suppose we define our pathfinding problem in a maze, where we can only move up, down, left, or right. There are a few obstacles in the maze that block our moves, so using the Euclidean distance is not ideal. A better heuristic would be to use the Manhattan distance, which can be defined as the sum of the horizontal and vertical distances between the coordinates of the current state and the goal.
Admissible and Non-Admissible Heuristics
The two heuristics we just defined regarding pathfinding problems are called admissible heuristics when they're used on their given problem domain.
Admissible means that we may underestimate the cost of reaching the end state but that we never overestimate it. Later, we will explore an algorithm that finds the shortest path between the current state and the goal state. The optimal nature of this algorithm depends on whether we can define an admissible heuristic function.
An example of a non-admissible heuristic would be the Euclidean distance that's applied to a real-world map.
Imagine that we want to move from point A to point B in the city of Manhattan. Here, the Euclidean distance will be the straight line between the two points, but, as we know, we cannot just go straight in a city such as Manhattan (unless we can fly). In this case, the Euclidean distance is underestimating the cost of reaching the goal. A better heuristic would be the Manhattan distance:
Note
The preceding map of Manhattan is sourced from Google Maps.
Since we overestimated the cost of traveling from the current node to the goal, the Euclidean distance is not admissible when we cannot move diagonally.
Heuristic Evaluation
We can create a heuristic evaluation for our tic-tac-toe game state from the perspective of the starting player by defining the utility of a move.
Heuristic 1: Simple Evaluation of the Endgame
Let's define a simple heuristic by evaluating a board. We can set the utility for the game as one of the following:
- +1, if the state implies that the AI player will win the game
- -1, if the state implies that the AI player will lose the game
- 0, if a draw has been reached or no clear winner can be identified from the current state
This heuristic is simple because anyone can look at a board and analyze whether a player is about to win.
The utility of this heuristic depends on whether we can play many moves in advance. Notice that we cannot even win the game within five steps. In Activity 1.01, Generating All Possible Sequences of Steps in a Tic-Tac-Toe Game, we saw that by the time we reach step five, we have 13,680 possible combinations leading to it. In most of these 13,680 cases, our heuristic returns zero as we can't identify a clear winner yet.
If our algorithm does not look deeper than these five steps, we are completely clueless on how to start the game. Therefore, we should invent a better heuristic.
Heuristic 2: Utility of a Move
Let's change the utility for the game as follows:
- Two AI signs in a row, column, or diagonal, and the third cell is empty: +1000 for the empty cell.
- The opponent has two signs in a row, column, or diagonal, and the third cell is empty: +100 for the empty cell.
- One AI sign in a row, column, or diagonal, and the other two cells are empty: +10 for the empty cells.
- No AI or opponent signs in a row, column, or diagonal: +1 for the empty cells.
- Occupied cells get a value of minus infinity. In practice, due to the nature of the rules, -1 will also do.
Why do we use a multiplicative factor of 10 for the first three rules compared to the fourth one? We do this because there are eight possible ways of making three in a row, column, and diagonal. So, even by knowing nothing about the game, we are certain that a lower-level rule may not accumulate to overriding a higher-level rule. In other words, we will never defend against the opponent's moves if we can win the game.
Note
As the job of our opponent is also to win, we can compute this heuristic from the opponent's point of view. Our task is to maximize this value, too, so that we can defend against the optimal plays of our opponent. This is the idea behind the Minmax algorithm as well, which will be covered later in this chapter. If we wanted to convert this heuristic into a heuristic that describes the current board, we could compute the heuristic value for all open cells and take the maximum of the values for the AI character so that we can maximize our utility.
For each board, we will create a utility matrix.
For example, consider the following board, with O
signs as the player and X
signs as the AI:
From here, we can construct its utility matrix shown in the following figure:
On the second row, the left cell is not beneficial if we were to select it. Note that if we had a more optimal utility function, we would reward blocking the opponent.
The two cells of the third column both get a 10
-point boost for two in a row.
The top-right cell also gets 100
points for defending against the diagonal of the opponent.
From this matrix, evidently, we should choose the top-right move. At any stage of the game, we were able to define the utility of each cell; this was a static evaluation of the heuristic function.
We can use this heuristic to guide us toward an optimal next move or to give a more educated score on the current board by taking the maximum of these values. We have technically used parts of this heuristic in the form of hardcoded rules. Note, though, that the real utility of heuristics is not the static evaluation of a board, but the guidance it provides for limiting the search space.
Exercise 1.04: Tic-Tac-Toe Static Evaluation with a Heuristic Function
In this exercise, you will be performing a static evaluation on the tic-tac-toe game using a heuristic function.
The following steps will help you to complete this exercise:
- Open a new Jupyter Notebook file.
- Reuse the code from Steps 2–6 of Activity 1.01, Generating All Possible Sequences of Steps in a Tic-Tac-Toe Game.
- Create a function that takes the board as input and returns
0
if the cell is empty, and-1
if it's not empty:def init_utility_matrix(board): Â Â Â Â return [0 if cell == EMPTY_SIGN \ Â Â Â Â Â Â Â Â Â Â Â Â else -1 for cell in board]
- Next, create a function that takes the utility vector of possible moves, takes three indices inside the utility vector representing a triple, and returns a function, as shown in the following code snippet:
def generate_add_score(utilities, i, j, k):     def add_score(points):         if utilities[i] >= 0:             utilities[i] += points         if utilities[j] >= 0:             utilities[j] += points         if utilities[k] >= 0:             utilities[k] += points     return add_score
In the preceding code snippet, the returned function will expect a
points
parameter and theutilities
vector as input and will add points to each cell in (i
,j
,k
), as long as the original value of that cell is non-negative (>=0
). In other words, we increased the utility of empty cells only. - Now, create the utility matrix belonging to any board constellation where you will add the
generate_add_score
function defined previously to update the score. You will also implement the rules that we discussed prior to this activity. These rules are as follows:Two AI signs in a row, column, or diagonal, and the third cell is empty: +1000 for the empty cell.
The opponent has two signs in a row, column, or diagonal, and the third cell is empty: +100 for the empty cell.
One AI sign in a row, column, or diagonal, and the other two cells are empty: +10 for the empty cells.
No AI or opponent signs in a row, column, or diagonal: +1 for the empty cells.
Let's create the utility matrix now:
def utility_matrix(board): Â Â Â Â utilities = init_utility_matrix(board) Â Â Â Â for [i, j, k] in combo_indices: Â Â Â Â Â Â Â Â add_score = generate_add_score(utilities, i, j, k) Â Â Â Â Â Â Â Â triple = [board[i], board[j], board[k]] Â Â Â Â Â Â Â Â if triple.count(EMPTY_SIGN) == 1: Â Â Â Â Â Â Â Â Â Â Â Â if triple.count(AI_SIGN) == 2: Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â add_score(1000) Â Â Â Â Â Â Â Â Â Â Â Â elif triple.count(OPPONENT_SIGN) == 2: Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â add_score(100) Â Â Â Â Â Â Â Â elif triple.count(EMPTY_SIGN) == 2 and \ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â triple.count(AI_SIGN) == 1: Â Â Â Â Â Â Â Â Â Â Â Â add_score(10) Â Â Â Â Â Â Â Â elif triple.count(EMPTY_SIGN) == 3: Â Â Â Â Â Â Â Â Â Â Â Â add_score(1) Â Â Â Â return utilities
- Create a function that selects the move with the highest utility value. If multiple moves have the same utility, the function returns both moves:
def best_moves_from_board(board, sign):     move_list = []     utilities = utility_matrix(board)     max_utility = max(utilities)     for i, v in enumerate(board):         if utilities[i] == max_utility:             move_list.append(board[:i] \                              + sign \                              + board[i+1:])     return move_list def all_moves_from_board_list(board_list, sign):     move_list = []     get_moves = best_moves_from_board if sign \                 == AI_SIGN else all_moves_from_board     for board in board_list:         move_list.extend(get_moves(board, sign))     return move_list
- Now, run the application, as shown in the following code snippet:
first_player, second_player, \ draw, total = count_possibilities()
The expected output is this:
step 0. Moves: 1 step 1. Moves: 1 step 2. Moves: 8 step 3. Moves: 24 step 4. Moves: 144 step 5. Moves: 83 step 6. Moves: 214 step 7. Moves: 148 step 8. Moves: 172 First player wins: 504 Second player wins: 12 Draw 91 Total 607
Note
To access the source code for this specific section, please refer to https://packt.live/2VpGyAv.
You can also run this example online at https://packt.live/2YnyO3K. You must execute the entire Notebook in order to get the desired result.
By completing this exercise, we have observed that the AI is underperforming compared to our previous activity, Activity 1.03, Fixing the First and Second Moves of the AI to Make It Invincible. In this situation, hardcoding the first two moves was better than setting up the heuristic, but this is because we haven't set up the heuristic properly.
Using Heuristics for an Informed Search
We have not experienced the real power of heuristics yet as we made moves without the knowledge of the effects of our future moves, thereby effecting reasonable play from our opponents.
Therefore, a more accurate heuristic leads to more losses than simply hardcoding the first two moves in the game. Note that in the previous section, we selected these two moves based on the statistics we generated based on running the game with fixed first moves. This approach is essentially what heuristic search should be all about.
Types of Heuristics
Static evaluation cannot compete with generating hundreds of thousands of future states and selecting a play that maximizes our rewards. This is because our heuristics are not exact and are likely not admissible either.
We saw in the preceding exercise that heuristics are not always optimal. We came up with rules that allowed the AI to always win the game or finish with a draw. These heuristics allowed the AI to win very frequently, at the expense of losing in a few cases. A heuristic is said to be admissible if we underestimate the utility of a game state, but we never overestimate it.
In the tic-tac-toe example, we likely overestimated the utility in a few game states, and why is that? Because we ended up with a loss 12 times. A few of the game states that led to a loss had a maximum heuristic score. To prove that our heuristic is not admissible, all we need to do is find a potentially winning game state that we ignored while choosing a game state that led to a loss.
There are two more features that describe heuristics, that is, optimal and complete:
- Optimal heuristics always find the best possible solution.
- Complete heuristics has two definitions, depending on how we define the problem domain. In a loose sense, a heuristic is said to be complete if it always finds a solution. In a strict sense, a heuristic is said to be complete if it finds all possible solutions. Our tic-tac-toe heuristic is not complete because we ignored many possible winning states on purpose, favoring a losing state.
As you can see, defining an accurate heuristic requires a lot of details and thinking in order to obtain a perfect AI agent. If you are not correctly estimating the utility in the game states, then you can end up with an AI underperforming hardcoded rules.
In the next section, we'll look at a better approach to executing the shortest pathfinding between the current state and the goal state.