Q-Learning in Python
The environment and the Q-Learning discussed in the previous section can be implemented in Python. Since the policy is just a simple table, there is, at this point in time no need for Keras. Listing 9.3.1 shows q-learning-9.3.1.py
, the implementation of the simple deterministic world (environment, agent, action, and Q-Table algorithms) using the QWorld
class. For conciseness, the functions dealing with the user interface are not shown.
In this example, the environment dynamics is represented by self.transition_table
. At every action, self.transition_table
determines the next state. The reward for executing an action is stored in self.reward_table
. The two tables are consulted every time an action is executed by the step()
function. The Q-Learning algorithm is implemented by update_q_table()
function. Every time the agent needs to decide which action to take, it calls the act()
function. The action may be randomly drawn or decided by the policy using the Q-Table...