Concept

Pre-training Objective for Language Models

The loss function initially defined for a single token sequence can be extended to an entire set of sequences, denoted as D\mathcal{D}. In this context, the primary objective of the pre-training process is to discover the optimal set of model parameters that minimizes the total loss calculated across all sequences within D\mathcal{D}.

Image 0

0

1

Updated 2026-04-15

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences