Case Study

Selecting a Pre-training Objective for a Regulated Enterprise Assistant

You lead model strategy for an internal enterprise assistant used by legal and compliance teams. The assistant must (1) generate long, coherent policy drafts and email responses, (2) answer questions that require understanding relationships across two adjacent paragraphs (e.g., “Does the exception in paragraph 2 apply to the rule in paragraph 1?”), and (3) be robust to noisy inputs from OCR and copy/paste (missing words, duplicated phrases, and occasional sentence reordering). You have budget to pre-train ONE base model from scratch and can choose ONE primary pre-training objective from the following families: causal language modeling (left-to-right next-token prediction), masked language modeling (predict masked tokens using both left and right context), denoising autoencoder reconstruction (reconstruct clean text from a corrupted version), permuted language modeling (predict tokens in a random permutation order), and optionally add Next Sentence Prediction (NSP) as an auxiliary loss.

Case study task: Choose the single primary objective you would use and decide whether you would include NSP as an auxiliary loss. Justify your choices by explicitly explaining the tradeoffs among (a) generation quality for long outputs, (b) bidirectional understanding within a span, (c) modeling cross-sentence/paragraph relationships, and (d) robustness to the specified noise patterns. Your answer should make clear why at least two of the non-chosen objectives are less suitable given these constraints.

0

1

Updated 2026-02-06

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

What is BERT?

Data Science

Ch.4 Alignment - Foundations of Large Language Models

Related