Learn Before
Mask-Predict Framework
The mask-predict framework is a general self-supervised learning strategy. It operates by systematically obscuring portions of an input sequence and then tasking the model with reconstructing the original, masked information using the visible parts of the sequence as context.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Comparison of Self-Supervised Pre-training and Self-Training
Architectural Categories of Pre-trained Transformers
Self-Supervised Classification Tasks for Encoder Training
Prefix Language Modeling (PrefixLM)
Mask-Predict Framework
Discriminative Training
Learning World Knowledge from Unlabeled Data
Emergent Linguistic Capabilities from Pre-training
Architectural Approaches to Self-Supervised Pre-training
Self-Supervised Pre-training of Encoders via Masked Language Modeling
Word Prediction as a Core Self-Supervised Task
Learning World Knowledge from Unlabeled Data via Self-Supervision
A research team has a massive collection of unlabeled historical texts. Their goal is to pre-train a language model that understands the specific vocabulary and sentence structures within these documents, but they have no budget for manual data annotation. Which of the following approaches is the most effective and feasible for their pre-training task?
Analysis of Supervision Signal Generation
A team is developing a pre-training strategy for a new language model using a large corpus of unlabeled text. Which of the following proposed tasks best exemplifies the principles of self-supervised learning?
Prevalence of Self-Supervised Pre-training in NLP
Learn After
Masked Language Modeling (MLM)
A researcher is developing a model to understand patterns in unlabeled time-series data from weather sensors. The data for each day is a sequence of 24 hourly temperature readings. The researcher's training strategy involves taking a sequence, randomly hiding the temperature reading for a single hour, and then training the model to estimate the hidden temperature value by looking at the readings from the other 23 hours. Which fundamental training strategy does this approach best exemplify?
Dual Role of Data in a Self-Supervised Task
Analyzing Self-Supervised Training Procedures