The problem
Now we are ready to formulate the reinforcement learning problem mathematically, so let's get right into it.
In the preceding diagram, you can see the setup of any reinforcement learning problem. In general, a reinforcement learning problem is characterized by an agent trying to learn things about its environment, as stated earlier.
Assuming that time evolves in discrete time steps, at time step 0, the agent looks at the environment. You can think of this observation as the situation the environment presents to the agent. It is also known as observing the state of the environment. Then the agent must select an appropriate action for that particular state. Next, the environment presents a new situation to the agent in response to the action taken by it. In the same time step, the environment gives the agent a reward, which gives some indication of whether the agent has responded appropriately or not. Then the process...