Learn Before
Self-Supervised Pre-training of Language Models
Large-scale research on pre-training in natural language processing began with the development of self-supervised language models, such as BERT and GPT. These models operate on the shared principle that general language understanding and generation can be achieved by training them to predict masked words within massive amounts of text. Despite the simplicity of this approach, the resulting models demonstrate a remarkable ability to model linguistic structure without explicit training for it. The generality of these self-supervised pre-training tasks leads to systems that exhibit strong performance across a wide variety of problems, often outperforming well-developed supervised systems.
0
1
Tags
Foundations of Large Language Models
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
A key development leading to modern large-scale language models was the adoption of a pre-training paradigm. How did the influential pre-training approach that became standard in computer vision fundamentally differ from the self-supervised approach that now dominates natural language processing?
Arrange the following key developments in the history of pre-training into the correct chronological and influential sequence, from the earliest concept to the modern paradigm.
Converging Histories of Pre-training
Divergent Pre-training Paradigms in NLP and Computer Vision
Self-Supervised Pre-training of Language Models