Summary
In this chapter, we covered an important class of algorithms called policy-based methods. These methods directly optimize a policy network unlike the value-based methods we covered in the previous chapter. As a result, they have stronger theoretical foundation. In addition, they can be used with continuous action spaces. With this, we have covered model-free approaches in detail. In the next chapter, we go into model-based methods, which aim to learn the dynamics of the environment the agent is in.