Learn Before
Comparison of ELMo and GPT on Downstream Adaptation
ELMo and GPT represent fundamentally different paradigms for adapting pretrained models to downstream natural language processing tasks. On one hand, ELMo provides context-sensitive representations but requires crafting a customized, task-specific architecture for each target task, and its pretrained parameters remain frozen during supervised learning. On the other hand, GPT offers a task-agnostic approach by using a unified Transformer decoder architecture where downstream tasks are accommodated by simply adding a linear output layer; additionally, GPT fine-tunes all of its pretrained parameters during downstream training rather than freezing them.
0
1
Tags
D2L
Dive into Deep Learning @ D2L
Related
A foundational generative language model introduced in 2018 significantly improved the ability to capture relationships between words far apart in a text, a major challenge for previous sequential models. Which of the following best analyzes the core architectural innovation responsible for this leap in performance?
Critique of an Early Transformer-Based Language Model
Training Objective of an Early Transformer Model
GPT-2
Similarities Between BERT and GPT in Fine-Tuning
Autoregressive Limitation of GPT
Comparison of ELMo and GPT on Downstream Adaptation
Comparison of ELMo, GPT, and BERT