Results
Let's now take a look at the results.
The feed-forward model
The convergence on Yandex data for one year requires about 10M training steps, which can take a while. (GTX 1080 Ti trains at a speed of 230-250 steps per second.)
During the training, we have several charts in TensorBoard showing us what's going on.
Figure 10.3: The reward for episodes during the training
Figure 10.4: The reward for test episodes
The two preceding charts show the reward for episodes played during the training and the reward obtained from testing (which is done on the same quotes, but with epsilon=0
). From them, we see that our agent is learning how to increase the profit from its actions over time.
Figure 10.5: The lengths of played episodes
Figure 10.6: The values predicted by the network on a subset of states
The lengths of episodes also increased after 1M training iterations. The number of values predicted by the network is growing.
...