Value iteration in practice
In this section, we will look at how the value iteration method will work for FrozenLake. The complete example is in Chapter05/01_frozenlake_v_iteration.py. The central data structures in this example are as follows:
-
Reward table: A dictionary with the composite key “source state” + “action” + “target state.” The value is obtained from the immediate reward.
-
Transitions table: A dictionary keeping counters of the experienced transitions. The key is the composite “state” + “action,” and the value is another dictionary that maps the “target state” into a count of times that we have seen it.
For example, if in state 0 we execute action 1 ten times, after three times, it will lead us to state 4 and after seven times to state 5. Then entry with the key (0, 1) in this table will be a dict with the contents {4: 3...