Evaluating Pre-training Objectives for a Multi-Task Model
A research team is developing a new encoder-decoder model for natural language understanding. Their goal is to create a single pre-trained model that can be effectively fine-tuned for two distinct tasks: (1) generating concise summaries of long articles, and (2) restoring missing phrases in damaged historical texts. They are considering two different pre-training objectives based on masking input text.
- Objective A: Randomly replace 15% of the individual words in the input text with a special
[MASK]token. The model must predict the original words at these masked positions. - Objective B: Replace contiguous spans of text, totaling approximately 35% of the input, with a single
[MASK]token. The model must generate the entire missing text span.
Evaluate which of these two pre-training objectives is more suitable for achieving the team's dual goals. Justify your choice by explaining how each objective would likely influence the model's capabilities for the target tasks.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Example of Full Sequence Generation via 100% Masking
A research team is pre-training two separate encoder-decoder models using different variations of a masked language modeling objective.
- Model A is trained by masking 15% of the input tokens, with each mask covering only a single token. The model's objective is to predict the original token for each masked position.
- Model B is trained by masking 50% of the input tokens, with masks covering contiguous spans of up to 10 tokens. The model's objective is to predict the entire original text span.
Which of the following statements most accurately analyzes the likely capabilities these two models will develop based on their pre-training objectives?
Evaluating Pre-training Objectives for a Multi-Task Model
Match each masked language modeling (MLM) pre-training strategy for an encoder-decoder model with the primary capability it is designed to develop.