Solving a Reinforcement Learning problem during the learning process estimates an evaluation function. This function must be able to assess, through the sum of the rewards, the convenience or, otherwise, a policy. The basic idea of Q-learning is that the algorithm learns the optimal evaluation function on the whole space of states and actions (SxA).
The so-called Q-function provides a match in the form Q: S × A => V, where V is the value of future rewards of an action, a Î A, executed in the state s Î S.
Once it has learned the optimal function, Q, the agent will of course be able to recognize what action will lead to the highest future reward in a s state.
One of the most used examples for implementing the Q-learning algorithm involves the use of a table. Each cell of the table is a value, Q(s; a)= V, initialized to 0.
The agent can perform any action a Î A, where A is...