Learn Before
Similarities Between BERT and GPT in Fine-Tuning
During the supervised learning phase for downstream tasks, the fine-tuning process for BERT shares two key similarities with GPT. First, the contextual representations generated by the pre-trained Transformer encoder are fed into an added output layer, requiring minimal modifications to the core architecture regardless of the task's nature (e.g., predicting a label for every token versus the entire sequence). Second, all parameters of the pre-trained model undergo fine-tuning, while the new parameters of the additional output layer are trained from scratch.

0
1
Contributors are:
Who are from:
Tags
What is BERT?
Data Science
D2L
Dive into Deep Learning @ D2L
Computing Sciences
Related
BERT Input Representation: Single and Paired Sentences
BERT's Contributions and Impact
Training Objective of the Standard BERT Model
Comparison of ELMo, GPT, and BERT
BERT Performance Improvements on NLP Tasks
Similarities Between BERT and GPT in Fine-Tuning
A foundational generative language model introduced in 2018 significantly improved the ability to capture relationships between words far apart in a text, a major challenge for previous sequential models. Which of the following best analyzes the core architectural innovation responsible for this leap in performance?
Critique of an Early Transformer-Based Language Model
Training Objective of an Early Transformer Model
GPT-2
Similarities Between BERT and GPT in Fine-Tuning
Autoregressive Limitation of GPT