Implications of Selective Gradient Propagation
A language model is being fine-tuned on a task where each training instance is a sequence formed by concatenating a 'prompt' and a 'completion'. The training loss is calculated based only on the model's ability to predict the 'completion' part. Analyze what happens to the model's parameters that process the 'prompt' part of the sequence during a single backpropagation step. Explain the reasoning behind this behavior.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Example of Context and Prediction Sub-sequences
A developer is fine-tuning a language model on a dataset where each entry consists of a context and a desired completion. For training, the context and completion are concatenated into a single input sequence. The training objective is configured so that the loss is calculated only on the model's predictions for the completion part of the sequence. Given this setup, which statement accurately describes how the model's parameters are updated during the backward pass for a single training step?
Debugging a Fine-Tuning Gradient Flow
Implications of Selective Gradient Propagation