Learning from Human Feedback
Learning from human feedback is an alignment method used after pre-training and supervised fine-tuning to address the risk of a model generating unfactual, biased, or harmful content. The process involves collecting human evaluations of the model's responses to various inputs, where experts assess the outputs based on their preferences and interests. This collected feedback is then utilized to further train the model, enhancing its alignment with user expectations.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.4 Alignment - Foundations of Large Language Models
Related
Learning from Human Feedback
A development team trains a large language model on a vast dataset of high-quality, curated instruction-and-response pairs to create a helpful chatbot. After this training, they observe that while the model answers most questions correctly, it occasionally generates responses that are subtly biased or confidently presents outdated, incorrect information when faced with novel or ambiguous user queries. Which of the following statements best analyzes the fundamental limitation demonstrated by the model's behavior?
Evaluating a Chatbot's Training Limitations
Analyzing Model Behavior After Instruction-Based Training
Learn After
Reinforcement Learning from Human Feedback (RLHF)
A development team is working on an AI assistant. After its initial training, they find that while the assistant's answers are factually accurate, they are often perceived as blunt or unhelpful. To address this, the team decides to use a process where human evaluators are shown a user's prompt followed by two or more different responses generated by the assistant. Which of the following tasks, given to the human evaluators, would be most effective for refining the model's helpfulness and tone?
Addressing Post-Tuning Model Flaws
An AI development team wants to improve a pre-trained model's alignment by making its responses more helpful and less likely to be harmful. Arrange the core steps of the process for incorporating human evaluations into this refinement stage.