Value iteration in practice
The complete example is in Chapter05/01_frozenlake_v_iteration.py
. The central data structures in this example are as follows:
- Reward table: A dictionary with the composite key "source state" + "action" + "target state". The value is obtained from the immediate reward.
- Transitions table: A dictionary keeping counters of the experienced transitions. The key is the composite "state" + "action", and the value is another dictionary that maps the target state into a count of times that we have seen it. For example, if in state 0 we execute action 1 ten times, after three times it will lead us to state 4 and after seven times to state 5.
The entry with the key
(0, 1)
in this table will be adict
with contents{4: 3, 5: 7}
. We can use this table to estimate the probabilities of our transitions. - Value table: A dictionary that maps a state into the calculated value of this state.
The overall...