Formula

Formula for Integrating a Prediction Network with a Pre-trained BERT Model

To adapt a pre-trained model like BERT for specific downstream tasks, it must be integrated with a task-specific predictor, or prediction network, that aligns the model's output with the problem of interest. Let BERTθ^()\mathrm{BERT}_{\hat{\theta}}(\cdot) be a BERT model with pre-trained parameters θ^\hat{\theta}, and Predictω()\mathrm{Predict}_{\omega}(\cdot) be a prediction network with parameters ω\omega. For an input x\mathbf{x}, the final prediction y\mathbf{y} that fits the problem is generated using the formula: y=Predictω(BERTθ^(x))\mathbf{y} = \mathrm{Predict}_{\omega}(\mathrm{BERT}_{\hat{\theta}}(\mathbf{x})). During the tuning process, the model receives a tuple (x,ygold)(\mathbf{x}, \mathbf{y}_{\mathrm{gold}}) of an input and its corresponding output. The optimization begins by initializing the parameters with θ^\hat{\theta}, denoted as θ^+\hat{\theta}^{+}, and the model's output is computed as yω,θ^+\mathbf{y}_{\omega,\hat{\theta}^{+}}. The complete model is then optimized by minimizing the loss over the tuning samples.

Image 0

0

1

Updated 2026-04-18

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.1 Pre-training - Foundations of Large Language Models