Diagnosing a Faulty Language Model Training Process
Based on the principles of training language models on concatenated input-output sequences, what is the most probable cause of the observed issue in the engineer's training setup? Explain your reasoning.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model is being trained on instruction-following data. For one specific training instance, the model processes the full tokenized sequence:
['User:', 'What', 'is', '2+2?', 'Assistant:', '4']. The goal is to train the model to provide the correct response ('4') when given the user's prompt. During the backpropagation step for this single instance, on which token(s) is the predictive loss calculated to update the model's weights?Diagnosing a Faulty Language Model Training Process
A machine learning engineer is training a language model for a question-answering task. The training data consists of concatenated
[question, answer]sequences. Due to a configuration error, the training loss is calculated across all tokens in the sequence (both question and answer), instead of only on the answer tokens. What is the most likely and significant negative consequence of this misconfiguration on the model's behavior?Loss Masking via Forward and Backward Passes in SFT