Learn Before
Emergent Linguistic Capabilities from Pre-training
A remarkable outcome of large-scale pre-training is that models develop a sophisticated capability for modeling linguistic structure. This ability is an emergent property that arises from simple, self-supervised objectives, such as predicting words in text, rather than from being explicitly trained on specific linguistic analysis tasks.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Comparison of Self-Supervised Pre-training and Self-Training
Architectural Categories of Pre-trained Transformers
Self-Supervised Classification Tasks for Encoder Training
Prefix Language Modeling (PrefixLM)
Mask-Predict Framework
Discriminative Training
Learning World Knowledge from Unlabeled Data
Emergent Linguistic Capabilities from Pre-training
Architectural Approaches to Self-Supervised Pre-training
Self-Supervised Pre-training of Encoders via Masked Language Modeling
Word Prediction as a Core Self-Supervised Task
Learning World Knowledge from Unlabeled Data via Self-Supervision
A research team has a massive collection of unlabeled historical texts. Their goal is to pre-train a language model that understands the specific vocabulary and sentence structures within these documents, but they have no budget for manual data annotation. Which of the following approaches is the most effective and feasible for their pre-training task?
Analysis of Supervision Signal Generation
A team is developing a pre-training strategy for a new language model using a large corpus of unlabeled text. Which of the following proposed tasks best exemplifies the principles of self-supervised learning?
Prevalence of Self-Supervised Pre-training in NLP
Learn After
A research team develops a large language model trained exclusively on a single, simple task: predicting the next word in a sentence, using a vast corpus of text from the internet. During evaluation, they discover the model is highly effective at identifying grammatically incorrect sentences, a task it was never explicitly trained to perform. Which of the following statements provides the best analysis of this outcome?
A research team wants to build a model that is highly proficient at identifying the grammatical structure of sentences. Based on the principles of how large models acquire linguistic skills, their most effective primary strategy would be to train the model from scratch exclusively on a dataset of sentences that have been manually annotated with their grammatical structures.
Explaining Un-Taught Abilities