Constrained generation and eliciting trustworthy outcomes
In practice, it is possible to constrain model generation and guide outcomes toward factuality and equitable outcomes. As discussed, guiding models toward trustworthy outcomes can be done through continued training and fine-tuning, or during inference. For example, methodologies such as reinforcement learning from human feedback (RLHF) and direct preference optimization (DPO) increasingly refine model outputs to align model outcomes with human judgment. Additionally, as discussed in Chapter 7, various grounding techniques help to ensure that model outputs reflect verified data, continuously guiding the model toward responsible and accurate content generation.
Constrained generation with fine-tuning
Refinement strategies such as RLHF integrate human judgments into the model training process, steering the AI toward behavior that aligns with ethical and truthful standards. By incorporating human feedback loops, RLHF ensures...