1Cademy - Pre-training Objective for Language Models

Learn Before

Total Loss Calculation for a Token Sequence

Concept

Pre-training Objective for Language Models

The loss function initially defined for a single token sequence can be extended to an entire set of sequences, denoted as $\mathcal{D}$ . In this context, the primary objective of the pre-training process is to discover the optimal set of model parameters that minimizes the total loss calculated across all sequences within $\mathcal{D}$ .