Results
Now that we’ve implemented them, let’s compare the performance of our two models, starting with feed-forward variant.
The feed-forward model
During the training, the average reward obtained by the agent was slowly but consistently growing. After 300k episodes, the growth slowed down. The following are charts (Figure 10.3) showing the raw reward during the training and the same data smoothed with the simple moving average of the last 15 values:
Figure 10.3: Reward during the training. Raw values (left) and smoothed (right)
Another pair of charts (Figure 10.4) shows the reward obtained from testing performed on the same training data but without random actions (𝜖 = 0):
Figure 10.4: Reward from the tests. Raw values (left) and smoothed (right)
Both the training and testing reward charts show that the agent is learning...