Short Answer

Debugging a Span-Based Denoising Training Pipeline

An engineer is training an encoder-decoder model with a span-based denoising objective. The input to the model's encoder is: 'The model <mask_A> to fill in the <mask_B> spans.' The engineer is unsure how to format the target sequence for the decoder. They consider two options:

  • Option 1: 'The model learns to fill in the missing text spans.'
  • Option 2: '<mask_A> learns <mask_B> missing text'

Which option is the correct target for this specific training objective, and why is the other option incorrect or less efficient for this task?

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science