Q-learning for FrozenLake
The whole example is in the Chapter05/02_frozenlake_q_iteration.py
file, and the difference is really minor. The most obvious change is to our value table. In the previous example, we kept the value of the state, so the key in the dictionary was just a state. Now we need to store values of the Q-function, which has two parameters: state and action, so the key in the value table is now a composite.
The second difference is in our calc_action_value()
function. We just don't need it anymore, as our action values are stored in the value table.
Finally, the most important change in the code is in the agent's value_iteration()
method. Before, it was just a wrapper around the calc_action_value()
call, which did the job of Bellman approximation. Now, as this function has gone and been replaced by a value table, we need to do this approximation in the value_iteration()
method.
Let's look at the code. As it's almost the same, I will jump...