Learn Before
Comparing Data Sourcing Strategies
Two teams are fine-tuning a large language model.
-
Team Alpha uses a fixed dataset where each input prompt is paired with a single, pre-written 'gold standard' response authored by a human expert. The model is trained exclusively on these static pairs.
-
Team Beta starts with a large collection of input prompts but no pre-written responses. During each step of their training process, they take a prompt and have the current version of their model generate a response. This newly generated input-output pair is then used for that training step.
Analyze the fundamental difference in how the output portion of the training data is constructed for Team Beta compared to Team Alpha. What is the primary advantage of Team Beta's approach in terms of the model's potential to generate novel responses?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.4 Alignment - Foundations of Large Language Models
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Formulating the Loss Function for Policy Learning in RLHF
A team is refining a language model using a method where, for each training step, a prompt is selected and the model itself generates a response. This prompt-response pair is then used as part of the input for that training step's update calculation. Based on this description, what is the most accurate analysis of the function of the model-generated response in this specific training phase?
Policy Learning in RLHF
Comparing Data Sourcing Strategies
Contrasting Data Sourcing Methods in Model Training
Optimal Parameters Formula in RL Fine-Tuning