Reinforcement learning from human feedback
At least two things are undeniable about ChatGPT. First, its launch was incredibly buzzy. If you follow ML topics on social and general media, you probably remember being overloaded with content about people using it for everything from writing new recipes to start-up growth plans, and from website code to Python data analysis tips. However, there’s a good reason for the buzz. It’s actually so much better in terms of performance than any other prompt-based NLP solution the world has seen before. It establishes a new state of the art in question answering, text generation, classification, and so many other domains. It’s so good, in some cases it’s even better than a basic Google search! How did they do this? RLHF is the answer!
While RLHF is not a new concept in and of itself, certainly the most obviously successful application of RLHF in the large language model domain is ChatGPT. The predecessor to ChatGPT was...