Learn Before
Evaluating Language Model Training Objectives
An AI research team is pre-training a language model to be highly proficient at understanding grammatical structure and subtle word choices. They are debating between two self-supervised training objectives using a large corpus of 500-word articles:
Objective A: The model reads an entire 500-word article and predicts its primary topic from a list of 10 categories (e.g., 'Sports', 'Technology', 'Politics').
Objective B: The model reads a 500-word article where a few words have been subtly replaced with incorrect but grammatically plausible alternatives. The model's task is to examine every single word and decide if it was part of the original text or a replacement.
Based on the team's goal, which training objective is likely to be more efficient for learning the desired skills? Justify your choice by comparing the nature and density of the learning signals provided by each objective.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Replaced Token Detection as a Self-Supervised Task
Imagine two language models are being trained on the same large text corpus. Model A's task is to read an entire sentence and predict a single label for it (e.g., 'positive sentiment' or 'negative sentiment'). Model B's task is to read the same sentence, but for every individual word, it must predict whether that word has been artificially replaced with a different, plausible-sounding word. Which statement best analyzes the fundamental difference in the learning signals these two models receive?
Choosing a Training Objective for Error Detection
Evaluating Language Model Training Objectives