1Cademy - A research team is refining a language model using two distinct methods. In Method A, they train the model on a large dataset of specific commands paired with ideal, human-written responses that perfectly execute those commands (e.g., Command: List three benefits of solar power. Ideal Response: A list of exactly three benefits). In Method B, they show human raters two different model-generated responses to the same open-ended prompt (e.g., Write a short, encouraging note) and ask the raters

Learn Before

Differing Motivations of Instruction and Human Preference Alignment

Multiple Choice

A research team is refining a language model using two distinct methods. In Method A, they train the model on a large dataset of specific commands paired with ideal, human-written responses that perfectly execute those commands (e.g., Command: 'List three benefits of solar power.' Ideal Response: A list of exactly three benefits). In Method B, they show human raters two different model-generated responses to the same open-ended prompt (e.g., 'Write a short, encouraging note') and ask the raters

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related