1Cademy - Formula for Integrating a Prediction Network with a Pre-trained BERT Model

Learn Before

Fine-Tuning Pre-trained Models for Downstream Tasks

Formula

Formula for Integrating a Prediction Network with a Pre-trained BERT Model

To adapt a pre-trained model like BERT for specific downstream tasks, it must be integrated with a task-specific predictor, or prediction network, that aligns the model's output with the problem of interest. Let $\mathrm{BERT}_{\hat{\theta}}(\cdot)$ be a BERT model with pre-trained parameters $\hat{\theta}$ , and $\mathrm{Predict}_{\omega}(\cdot)$ be a prediction network with parameters $\omega$ . For an input $\mathbf{x}$ , the final prediction $\mathbf{y}$ that fits the problem is generated using the formula: $\mathbf{y} = \mathrm{Predict}_{\omega}(\mathrm{BERT}_{\hat{\theta}}(\mathbf{x}))$ . During the tuning process, the model receives a tuple $(\mathbf{x}, \mathbf{y}_{\mathrm{gold}})$ of an input and its corresponding output. The optimization begins by initializing the parameters with $\hat{\theta}$ , denoted as $\hat{\theta}^{+}$ , and the model's output is computed as $\mathbf{y}_{\omega,\hat{\theta}^{+}}$ . The complete model is then optimized by minimizing the loss over the tuning samples.

0

1

Updated 2026-06-20

Contributors are:

Who are from:

References

Learn Before

Related

Learn After