Case Study

Selecting a Pre-training Strategy for a Summarization Model

A research team is pre-training an encoder-decoder model that will later be fine-tuned for abstractive text summarization. The team must decide on the most effective input corruption strategy for this pre-training phase. They are considering two primary methods:

  1. Token Masking: Randomly replacing 15% of the input tokens with a special [MASK] token and training the model to predict the original tokens.
  2. Sentence Shuffling: Randomly reordering the sentences within a document and training the model to reconstruct the original sentence order.

Analyze these two options. Which strategy is likely to be more beneficial for preparing a model for abstractive summarization, and why? Justify your reasoning by connecting the skills learned during pre-training to the requirements of the final summarization task.

0

1

Updated 2025-09-28

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science