Formula

Parameterized Prediction Function using a BERT model

After the fine-tuning process, a complete BERT-based architecture for downstream tasks can be represented by the formula Predictω~(BERTθ~())\mathrm{Predict}_{\tilde{\omega}}(\mathrm{BERT}_{\tilde{\theta}}(\cdot)). This denotes that the model is applied to new data using the optimized, fine-tuned parameters θ~\tilde{\theta} for the BERT encoder and ω~\tilde{\omega} for the prediction network. The specific form of the downstream task dictates both the input and output formats of this model, as well as the underlying architecture of the prediction network layered on top of the BERT encoder.

Image 0

0

1

Updated 2026-04-18

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.1 Pre-training - Foundations of Large Language Models