Example of Context and Prediction Sub-sequences
An example of a sequence divided into context (input) and prediction (output) parts can be seen in a simple question-answering task. For instance, the sequence can be structured as: ⟨s⟩ Square this number . 2 . serving as the input context, followed by The result is 4 . as the output prediction. In this scenario, the loss would be calculated, and gradients back-propagated, based on the prediction part of the sequence.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Example of Context and Prediction Sub-sequences
A developer is fine-tuning a language model on a dataset where each entry consists of a context and a desired completion. For training, the context and completion are concatenated into a single input sequence. The training objective is configured so that the loss is calculated only on the model's predictions for the completion part of the sequence. Given this setup, which statement accurately describes how the model's parameters are updated during the backward pass for a single training step?
Debugging a Fine-Tuning Gradient Flow
Implications of Selective Gradient Propagation
Learn After
A language model is being trained on the sequence:
⟨s⟩ Translate to Spanish: The cat sat. El gato se sentó. ⟨/s⟩. To effectively teach the model how to perform the translation, on which part of the sequence should the training loss be calculated?Debugging a Chatbot Training Process
Rationale for Sub-sequence Loss Calculation