Learn Before
Contrasting Data Sourcing Methods in Model Training
A language model is being refined through a process where, for each training instance, an input prompt is selected from a collection. The model then generates a corresponding output based on its current state. This input-output pair is then immediately used for that training step. Contrast this method of obtaining the output portion of a training sample with an approach that uses a fixed, pre-written set of ideal outputs for each prompt. What is a primary advantage of the model-generating its own outputs for training?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.4 Alignment - Foundations of Large Language Models
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Formulating the Loss Function for Policy Learning in RLHF
A team is refining a language model using a method where, for each training step, a prompt is selected and the model itself generates a response. This prompt-response pair is then used as part of the input for that training step's update calculation. Based on this description, what is the most accurate analysis of the function of the model-generated response in this specific training phase?
Policy Learning in RLHF
Comparing Data Sourcing Strategies
Contrasting Data Sourcing Methods in Model Training
Optimal Parameters Formula in RL Fine-Tuning