The solution
Now that we have learned how to specify the problem using an MDP, the agent needs to formulate a strategy to solve it. This strategy can also be called a policy.
Policies and value functions
A policy defines the learning agent's way of behaving at a given time. A policy is denoted by the Greek letter Pi. The policy cannot be defined by a formula; it's more of an intuition-based concept.
Let's take an example. For a robot that needs to find a way out of a room, it may have the following policies:
- Go randomly
- Go along the walls
- Find the shortest path to the door
For us to mathematically predict which action to take in a particular state, we need a function. Let's define a function that takes in the current state and outputs a number that signifies how valuable that state is to be in. For example, if you want to cross the river, a position near the bridge would be more valuable than the state far from it. This function is called the value function...