Debugging a Span-Based Denoising Training Pipeline
An engineer is training an encoder-decoder model with a span-based denoising objective. The input to the model's encoder is: 'The model <mask_A> to fill in the <mask_B> spans.' The engineer is unsure how to format the target sequence for the decoder. They consider two options:
- Option 1: 'The model learns to fill in the missing text spans.'
- Option 2: '<mask_A> learns <mask_B> missing text'
Which option is the correct target for this specific training objective, and why is the other option incorrect or less efficient for this task?
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An encoder-decoder model is being trained with a span-based denoising objective. The encoder is given the following corrupted input text: 'To learn about the solar system, we first study <mask_0> and then move on to <mask_1> planets.' The original, uncorrupted text for the masked spans is '<mask_0>' = 'the Sun' and '<mask_1>' = 'the other'. What should the target output sequence for the decoder be in this training step?
Analysis of Denoising Training Objectives
Debugging a Span-Based Denoising Training Pipeline
Your team is pretraining an internal T5-style enco...
Your company wants one internal model to support m...
Your team is pretraining an internal T5-style mode...
Your team is building a single internal T5-style t...
Diagnosing a T5-Style Model That Ignores Task Prefixes After Span-Denoising Pretraining
Choosing Between Span-Denoising Pretraining and Task-Specific Fine-Tuning in a T5-Style Text-to-Text System
Designing a Unified Text-to-Text Model and Pretraining Objective for Multiple NLP Features
Root-Cause Analysis of a T5-Style Model Producing Fluent but Unfaithful Outputs
Selecting an Architecture and Pretraining Objective for a Unified Internal NLP Service
Post-Pretraining Data Formatting Bug in a T5-Style Text-to-Text Service