Any reinforcement learning problem can be viewed as a Markov decision process, which we briefly looked at in Chapter 1, Foundations of Artificial Intelligence Based Systems. We will look at this again in more detail for your benefit. In the Markov decision process, we have an agent interacting with an environment. At any given instance, the t agent is exposed to one of many states: (s(t) = s) ∈ S. Based on the agent's action (a(t) = a) ∈ A in the state s(t) the agent is presented with a new state (s(t+1) = s′) ∈ S. Here, S denotes the set of all states the agent may be exposed to, while A denotes the possible actions the agent can partake in.
You may now wonder how an agent takes action. Should it be random or based on heuristics? Well, it depends how much the agent has interacted with the environment in question. In the...