Learn Before
Selecting a Pre-training Strategy for a Summarization Model
A research team is pre-training an encoder-decoder model that will later be fine-tuned for abstractive text summarization. The team must decide on the most effective input corruption strategy for this pre-training phase. They are considering two primary methods:
- Token Masking: Randomly replacing 15% of the input tokens with a special
[MASK]token and training the model to predict the original tokens. - Sentence Shuffling: Randomly reordering the sentences within a document and training the model to reconstruct the original sentence order.
Analyze these two options. Which strategy is likely to be more beneficial for preparing a model for abstractive summarization, and why? Justify your reasoning by connecting the skills learned during pre-training to the requirements of the final summarization task.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Selecting a Pre-training Strategy for a Summarization Model
A research team is pre-training an encoder-decoder model specifically for the task of correcting complex grammatical errors and improving sentence structure in user-generated text. The team wants to select a pre-training objective that will best prepare the model for this downstream task. Which of the following input corruption strategies is most likely to be effective, and why?
Designing an Experiment to Select a Pre-training Objective