During supervised fine-tuning, if a model is trained on concatenated [input, output] sequences and the training loss is calculated across the entire sequence (both input and output tokens), the model is still being optimized primarily to improve its conditional generation capabilities for the given input.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
SFT Objective as Maximizing Joint Log-Probability of Concatenated Sequences
In a common fine-tuning strategy, a prompt and its desired completion are concatenated into a single sequence (e.g.,
[prompt_tokens, completion_tokens]). The language model is then trained on this full sequence, but the training loss is calculated only for the model's predictions on the completion tokens. What is the most accurate analysis of the primary purpose of this specific loss calculation method?During supervised fine-tuning, if a model is trained on concatenated
[input, output]sequences and the training loss is calculated across the entire sequence (both input and output tokens), the model is still being optimized primarily to improve its conditional generation capabilities for the given input.Diagnosing a Faulty Fine-Tuning Process
Loss Masking via Forward and Backward Passes in SFT