You're reading from Deep Reinforcement Learning Hands-On A practical and easy-to-follow guide to RL from Q-learning and DQNs to PPO and RLHF

Product type Paperback

Published in Nov 2024

Publisher Packt

ISBN-13 9781835882702

Length 716 pages

Edition 3rd Edition

Languages

Python

Tools

PyTorch

Concepts

Deep Reinforcement Learning

Author (1):

Maxim Lapan

View More author details

Table of Contents (29) Chapters

Preface

1. Part 1 Introduction to RL FREE CHAPTER

2. What Is Reinforcement Learning?

3. OpenAI Gym API and Gymnasium

4. Deep Learning with PyTorch

5. The Cross-Entropy Method

6. Part 2 Value-based methods

7. Tabular Learning and the Bellman Equation

8. Deep Q-Networks

9. Higher-Level RL Libraries

10. DQN Extensions

11. Ways to Speed Up RL

12. Stocks Trading Using RL

13. Part 3 Policy-based methods

14. Policy Gradients

15. Actor-Critic Method: A2C and A3C

16. The TextWorld Environment

17. Web Navigation

18. Part 4 Advanced RL

19. Continous Action Space

20. Trust Region Methods

21. Black-Box Optimizations in RL

22. Advanced Exploration

23. Reinforcement Learning with Human Feedback

24. AlphaGo Zero and MuZero

25. RL in Discrete Optimization

26. Multi-Agent RL

27. Bibliography

28. Index

Reinforcement Learning with Human Feedback

In this chapter, we’ll take a look at a relatively recent method that addresses situations when the desired behavior is hard to define via the explicit reward function – reinforcement learning with human feedback (RLHF) . This is also related to exploration (as the method allows humans to push learning in a new direction), the problem we discussed in Chapter 18. Surprisingly, the method, initially developed for a very specific subproblem in the RL domain, turned out to be enormously successful in the large language models (LLMs). Nowadays, RLHF is at the core of modern LLM training pipelines, and without it, the recent fascinating progress wouldn’t have been possible.

As this book is not about LLMs and modern chatbots, we will focus purely on the original paper from OpenAI and Google by Christiano et al., Deep reinforcement learning from human preferences [Chr+17], which describes the RLHF method...

The rest of the chapter is locked

You're reading from Deep Reinforcement Learning Hands-On A practical and easy-to-follow guide to RL from Q-learning and DQNs to PPO and RLHF

Table of Contents (29) Chapters

Reinforcement Learning with Human Feedback

Authors (1)

Personalised recommendations for you

You're reading from Deep Reinforcement Learning Hands-On A practical and easy-to-follow guide to RL from Q-learning and DQNs to PPO and RLHF

Table of Contents (29) Chapters

Reinforcement Learning with Human Feedback

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you