Essay

Choosing Between Span-Denoising Pretraining and Task-Specific Fine-Tuning in a T5-Style Text-to-Text System

You lead an applied NLP team building a single internal “language workbench” model for three enterprise workflows: (1) generate a 1–2 sentence customer-support ticket summary, (2) extract a JSON-like list of product names mentioned in a ticket, and (3) rewrite a ticket into a more polite tone. Leadership wants one model that can do all three by changing only a textual instruction prefix (e.g., “summarize: …”, “extract products: …”, “rewrite politely: …”).

You have budget for either (A) large-scale self-supervised pretraining of an encoder–decoder model using span-based denoising (mask contiguous spans in the input with sentinel tokens and train the decoder to output the missing spans with those sentinels), followed by light fine-tuning on a small labeled set for each workflow, or (B) no denoising pretraining, but heavier supervised fine-tuning on larger labeled sets for each workflow.

Write an evaluation memo recommending A or B. Your memo must explicitly connect: (i) how the encoder–decoder architecture supports instruction-conditioned text-to-text behavior across these heterogeneous tasks, and (ii) how span-based denoising changes what the encoder and decoder learn (and why that matters for both generation tasks like summarization/rewriting and “structured” generation like product extraction). Include at least two concrete risks/tradeoffs of your chosen option (e.g., failure modes, data requirements, output controllability), and propose one mitigation for each risk.

0

1

Updated 2026-02-06

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Data Science

Related