1Cademy - Analyzing a Fine-Tuning Training Objective

Learn Before

Loss Masking via Forward and Backward Passes in SFT

Case Study

Analyzing a Fine-Tuning Training Objective

Based on the per-token loss values provided in this training step, is the model being optimized correctly to perform the summarization task? Explain your reasoning.

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

A machine learning engineer is fine-tuning a pre-trained language model to function as a helpful assistant. The training data consists of pairs of instructions and desired responses. For each pair, the instruction and response are combined into a single sequence, and the model is trained to predict the next token at each position. However, due to a configuration error, the training loss is calculated across the entire combined sequence (both the instruction and the response tokens), instead of only on the response tokens. What is the most likely undesirable outcome of this training setup?
Applying Loss Masking in SFT
Analyzing a Fine-Tuning Training Objective

Learn Before

Related