1Cademy - Diagnosing a Faulty Language Model Training Process

Learn Before

Selective Loss Computation in Joint Probability Language Modeling

Case Study

Diagnosing a Faulty Language Model Training Process

Based on the principles of training language models on concatenated input-output sequences, what is the most probable cause of the observed issue in the engineer's training setup? Explain your reasoning.

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

A language model is being trained on instruction-following data. For one specific training instance, the model processes the full tokenized sequence: ['User:', 'What', 'is', '2+2?', 'Assistant:', '4']. The goal is to train the model to provide the correct response ('4') when given the user's prompt. During the backpropagation step for this single instance, on which token(s) is the predictive loss calculated to update the model's weights?
Diagnosing a Faulty Language Model Training Process
A machine learning engineer is training a language model for a question-answering task. The training data consists of concatenated [question, answer] sequences. Due to a configuration error, the training loss is calculated across all tokens in the sequence (both question and answer), instead of only on the answer tokens. What is the most likely and significant negative consequence of this misconfiguration on the model's behavior?
Loss Masking via Forward and Backward Passes in SFT

Learn Before

Related