Activity (Process)

Reinforcement Learning from Human Feedback (RLHF)

Reinforcement Learning from Human Feedback (RLHF) is an alternative fine-tuning method for Large Language Models, introduced by Christiano et al. (2017) and later refined by Stiennon et al. (2020). It addresses the LLM alignment challenge by framing it as a reinforcement learning problem. The fundamental concept is that an LLM learns to align with human values by being trained on comparisons between different model outputs, using a reward signal derived from this human feedback to optimize its policy.

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models

Related
Learn After