Value of action
To make our life slightly easier, we can define different quantities in addition to the value of state : value of action
. Basically, it equals the total reward we can get by executing action a in state s and can be defined via
. Being a much less fundamental entity than
, this quantity gave a name to the whole family of methods called "Q-learning", because it is slightly more convenient in practice. In these methods, our primary objective is to get values of Q for every pair of state and action.
![Value of action](https://static.packt-cdn.com/products/9781788834247/graphics/graphics/B09471_05_65.jpg)
Q for this state s and action a equals the expected immediate reward and the discounted long-term reward of the destination state. We also can define via Â
:
![Value of action](https://static.packt-cdn.com/products/9781788834247/graphics/graphics/B09471_05_68.jpg)
This just means that the value of some state equals to the value of the maximum action we can execute from this state. It may look very close to the value of state, but there is still a difference, which is important to understand. Finally, we can express Q(s, a) via itself, which will be used in the...