Case Study

Selecting an Architecture and Pretraining Objective for a Unified Internal NLP Service

You are designing a single internal NLP service for a regulated enterprise that must support multiple text-in/text-out features behind one API: (1) "summarize" long incident reports into 3 bullet points, (2) "extract" a comma-separated list of product names mentioned in a customer email, and (3) "rewrite" a draft response to be more formal. The platform team wants one model that can switch behavior based on a textual instruction prefix (e.g., "summarize:", "extract:", "rewrite:") and can be pre-trained on a large corpus of unlabeled internal documents before any task-specific fine-tuning.

A prototype team proposes using an encoder-only model with a classification head for extraction and a separate decoder-only model for summarization/rewriting, arguing it will be simpler. Another team proposes a single encoder-decoder model trained in a T5-style text-to-text framework, using span-based denoising pretraining (masking contiguous spans in the input with sentinel tokens and training the decoder to output the missing spans) before fine-tuning on the three tasks.

As the technical reviewer, which proposal would you approve and why? In your answer, explicitly connect (a) how the text-to-text instruction prefix interacts with the chosen architecture, and (b) how span-based denoising pretraining prepares (or fails to prepare) the model for all three downstream behaviors within one system.

Image 0

0

1

Updated 2026-02-06

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Data Science

Related