Summary
In this chapter, we implemented the AlphaGo Zero method, which was created by DeepMind to solve board games. The primary point of this method is to allow agents to improve their strength via self-play, without any prior knowledge from human games or other data sources.
In the next chapter, we will discuss another direction of practical RL: discrete optimization problems, which play an important role in various real-life problems, from schedule optimization to protein folding.