Evaluating a Training Objective for a Base Model
An engineer is training a large language model from scratch on a massive dataset of books. The goal is to create a 'base model' that has a general understanding of language structure and can be used for various downstream tasks later. For each book, the model processes it as a single, long sequence of tokens. The training objective is to accurately predict each token in the book, given all the preceding tokens. Therefore, the training error (loss) is calculated and summed up across every single token in the entire book.
Evaluate whether this training objective is appropriate for creating a versatile 'base model'. Justify your reasoning by explaining the main benefit of calculating the loss over the entire sequence.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Conditional vs. Joint Probability Objectives in Language Modeling
A language model is being trained with the objective of modeling the joint probability of an input sequence
xand an output sequencey, which are treated as a single, concatenated sequence. During a single training step for this combined sequence, how is the model's performance error (loss) calculated?Evaluating a Training Objective for a Base Model
A language model is being trained with the objective of modeling the joint probability of a combined sequence
[x, y]. For this objective, the model's parameters are updated based only on its ability to correctly predict the tokens in the output sequencey.