Summary
In this chapter, we saw a practical example of RL and implemented a trading agent and a custom Gym environment. We tried two different architectures: a feed-forward network with price history on input and a 1D convolution network. Both architectures used the DQN method, with some of the extensions described in Chapter 8.
This was the last chapter in Part 2 of this book. In Part 3, we will talk about a different family of RL methods: policy gradients. We’ve touched on this approach a bit, but in the upcoming chapters, we will go much deeper into the subject, covering the REINFORCE method and the best method in the family: Asynchronous Advantage Actor-Critic, also known as A3C.
Leave a Review!
Thank you for purchasing this book from Packt Publishing—we hope you enjoy it! Your feedback is invaluable and helps us improve and grow. Once you’ve completed reading it, please take a moment to leave an Amazon review; it will only...