1Cademy - Evaluating a Training Objective for a Base Model

Learn Before

General Language Modeling Objective based on Joint Log-Probability

Case Study

Evaluating a Training Objective for a Base Model

An engineer is training a large language model from scratch on a massive dataset of books. The goal is to create a 'base model' that has a general understanding of language structure and can be used for various downstream tasks later. For each book, the model processes it as a single, long sequence of tokens. The training objective is to accurately predict each token in the book, given all the preceding tokens. Therefore, the training error (loss) is calculated and summed up across every single token in the entire book.

Evaluate whether this training objective is appropriate for creating a versatile 'base model'. Justify your reasoning by explaining the main benefit of calculating the loss over the entire sequence.

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Learn Before

Related