Case Study

Evaluating Pre-training Objectives for a Multi-Task Model

A research team is developing a new encoder-decoder model for natural language understanding. Their goal is to create a single pre-trained model that can be effectively fine-tuned for two distinct tasks: (1) generating concise summaries of long articles, and (2) restoring missing phrases in damaged historical texts. They are considering two different pre-training objectives based on masking input text.

  • Objective A: Randomly replace 15% of the individual words in the input text with a special [MASK] token. The model must predict the original words at these masked positions.
  • Objective B: Replace contiguous spans of text, totaling approximately 35% of the input, with a single [MASK] token. The model must generate the entire missing text span.

Evaluate which of these two pre-training objectives is more suitable for achieving the team's dual goals. Justify your choice by explaining how each objective would likely influence the model's capabilities for the target tasks.

0

1

Updated 2025-10-02

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science