Designing a Self-Supervised Text Classification Task
Imagine you need to train a model to understand the nuances of a language using a vast library of unlabeled books. Your challenge is to create a learning task for the model without manually creating any labels. Propose a novel binary classification task that can be automatically generated from the raw text. Your proposal must describe:
- The process for creating 'positive' and 'negative' examples from the text.
- What the model will predict for any given example.
- A brief justification for why successfully performing this task would lead to a better understanding of language.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Creation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Next Sentence Prediction (NSP)
Per-Token Classification for Encoder Training
Designing a Self-Supervised Text Classification Task
A researcher aims to pre-train a text encoder on a large corpus of unlabeled articles. They propose the following self-supervised classification task: For each training instance, a paragraph is extracted. With 50% probability, the sentences within that paragraph are randomly reordered. The model's task is to predict a binary label: 'Original Order' or 'Shuffled Order'. Which statement best evaluates the potential effectiveness of this task for its intended purpose?
A key aspect of training text encoders with self-supervision is designing a classification task that forces the model to learn a useful property of language. Match each proposed self-supervised classification task with the primary linguistic property it is designed to teach the model.