Summary
In this chapter, we saw a practical example of RL and implemented the trading agent and custom Gym environment. We tried two different architectures: a feed-forward network with price history on input and a 1D convolution network. Both architectures used the DQN method, with some extensions described in Chapter 7, DQN Extensions.
This is the last chapter in part two of the book. In part three, we’ll talk about a different family of RL methods: policy gradients.