AI solution
Let's start by reminding ourselves of the whole deep Q-learning model, while adapting it to this case study, so that you don't have to scroll or turn many pages back into the previous chapters. Repetition is never bad; it sticks the knowledge into our heads more firmly. Here's the deep Q-learning algorithm for you again:
Initialization:
- The memory of the experience replay is initialized to an empty list, called
memory
in the code (thedqn.py
Python file in theChapter 11
folder of the GitHub repo). - We choose a maximum size for the memory, called
max_memory
in the code (thedqn.py
Python file in theChapter 11
folder of the GitHub repo).
At each time t (each minute), we repeat the following process, until the end of the epoch:
- We predict the Q-values of the current state . Since five actions can be performed (0 == Cooling 3°C, 1 == Cooling 1.5°C, 2 == No Heat Transfer, 3 == Heating 1.5°C, 4 == Heating...