AI solution refresher
Let's refresh our memory by reminding ourselves of the steps of the deep Q-learning process, while adapting them to our self-driving car application.
Initialization:
- The memory of the experience replay is initialized to an empty list, called memory in the code.
- The maximum size of the memory is set, called capacity in the code.
At each time t, the AI repeats the following process, until the end of the epoch:
- The AI predicts the Q-values of the current state St. Therefore, since three actions can be played (0 <-> 0°, 1 <-> 20°, or 2 <-> -20°), it gets three predicted Q-values.
- The AI performs an action selected by the Softmax method (see Chapter 5, Your First AI Model – Beware the Bandits!):
- The AI receives a reward , which is one of -1, -0.2 or +0.1.
- The AI reaches the next state , which is composed of the next three signals from the three sensors, plus the orientation of...