One of the environments that's available is the frozen lake one. The goal of this environment is quite simple: we want to cross a frozen lake divided into sward blocks, but there are some holes (H) that we need to avoid. We can walk on top of the frozen parts (F) without a problem and move in a maximum of four different directions: up, down, left, and right:
A visualization of the frozen lake problem
The Q-learning algorithm needs the following parameters:
- Step size: s 𝛼 ∈(0, 1]
- Small 𝜀 > 0
Then, the algorithm works as follows:
- Initialize Q(s,a) for all s ∈ S+ and a ∈ A(s) arbitrarily, except that Q(terminal,) = 0.
- Loop for each episode.
- Initialize S.
- Choose A from S using the policy derived from Q (for example, -greedy).
- Loop for each step of the episode, as follows:
- Choose A' from S' using the...