Formula

Pre-training Objective Formula

The objective of pre-training a model on a set of sequences, denoted as D\mathcal{D}, is formally expressed as finding the optimal parameters θ^\hat{\theta} that minimize the total loss across the dataset. This is represented by the equation: θ^=argminθxDLossθ(x)\hat{\theta} = \arg \min_{\theta} \sum_{\mathbf{x} \in \mathcal{D}} \mathrm{Loss}_{\theta}(\mathbf{x}) where Lossθ(x)\mathrm{Loss}_{\theta}(\mathbf{x}) is the loss evaluated on a single sequence x\mathbf{x}.

0

1

Updated 2026-04-15

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences