Q-iteration for FrozenLake
The whole example is in the Chapter05/02_frozenlake_q_iteration.py file, and the differences are really minor:
-
The most obvious change is to our value table. In the previous example, we kept the value of the state, so the key in the dictionary was just a state. Now we need to store values of the Q-function, which has two parameters, state and action, so the key in the value table is now a composite of (State, Action) values.
-
The second difference is in our calc_action_value() function. We just don’t need it anymore, as our action values are stored in the value table.
-
Finally, the most important change in the code is in the agent’s value_iteration() method. Before, it was just a wrapper around the calc_action_value() call, which did the job of Bellman approximation. Now, as this function has gone and been replaced by a value table, we need to do this approximation...