Value of action
To make our life slightly easier, we can define different quantities in addition to the value of state : value of action . Basically, it equals the total reward we can get by executing action a in state s and can be defined via . Being a much less fundamental entity than , this quantity gave a name to the whole family of methods called "Q-learning", because it is slightly more convenient in practice. In these methods, our primary objective is to get values of Q for every pair of state and action.
Q for this state s and action a equals the expected immediate reward and the discounted long-term reward of the destination state. We also can define via :
This just means that the value of some state equals to the value of the maximum action we can execute from this state. It may look very close to the value of state, but there is still a difference, which is important to understand. Finally, we can express Q(s, a) via itself, which will be used in the next chapter's topic of Q...