Case Study

Evaluating a Training Objective for a Base Model

An engineer is training a large language model from scratch on a massive dataset of books. The goal is to create a 'base model' that has a general understanding of language structure and can be used for various downstream tasks later. For each book, the model processes it as a single, long sequence of tokens. The training objective is to accurately predict each token in the book, given all the preceding tokens. Therefore, the training error (loss) is calculated and summed up across every single token in the entire book.

Evaluate whether this training objective is appropriate for creating a versatile 'base model'. Justify your reasoning by explaining the main benefit of calculating the loss over the entire sequence.

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science