RLHF (Reinforcement Learning from Human Feedback)

A training technique where human evaluators rank model outputs, and these rankings train a reward model. The language model is then optimized via reinforcement learning to produce outputs the reward model scores highly. RLHF is a key alignment method.