1Cademy - Pre-training Objective Formula

Learn Before

Pre-training Objective for Language Models

Formula

Pre-training Objective Formula

The objective of pre-training a model on a set of sequences, denoted as $\mathcal{D}$ , is formally expressed as finding the optimal parameters $\hat{\theta}$ that minimize the total loss across the dataset. This is represented by the equation: $\hat{\theta} = \arg \min_{\theta} \sum_{\mathbf{x} \in \mathcal{D}} \mathrm{Loss}_{\theta}(\mathbf{x})$ where $\mathrm{Loss}_{\theta}(\mathbf{x})$ is the loss evaluated on a single sequence $\mathbf{x}$ .