Results
Let’s now take a look at the results.
The feed-forward model
The convergence on Yandex data for one year requires about 10M training steps, which can take a while (GTX 1080Ti trains at a speed of 230-250 steps per second). During training, we have several charts in TensorBoard showing us what’s going on.
The following are two charts, reward_100 and steps_100, with average reward (which is in percentages) and the average length of the episode for the last 100 episodes, respectively:
The charts show us two good things:
- Our agent was able to figure out when to buy and sell the share to get positive reward (as we need to pay a commission of 0.1% on the open and close of the position, random actions will have -0.2% reward).
- Over the training time, the length of the episode increased from seven bars to 25 and still continues to grow slowly, which means that the agent is holding the share longer and...