Essay

Design Review: Training Data and Safety Controls for a Customer-Facing LLM

You are on a cross-functional design review for a customer-facing LLM that will be embedded in your company’s support portal. The product team proposes improving answer quality by fine-tuning on (1) five years of internal support tickets and chat transcripts (which include customer names, emails, addresses, and occasional payment-related details) and (2) historical agent notes that sometimes contain subjective descriptions of customers and outcomes (e.g., “difficult customer,” “likely fraud,” “VIP”). The same model will also be accessible via an API to enterprise customers, and you expect some users will attempt to elicit disallowed content (e.g., instructions for wrongdoing) or to extract sensitive information.

Write a recommendation memo (as if to a VP) that decides whether to proceed as proposed, proceed with modifications, or pause. Your memo must explicitly connect: (a) how training-data bias could show up in model behavior for different customer groups, (b) how privacy risks could arise from memorization and reproduction of sensitive details, and (c) how value alignment via refusal behavior should be designed and tested to reduce misuse—while also explaining the tradeoffs among these controls (e.g., how aggressive filtering/anonymization or refusal policies might affect usefulness and safety). Conclude with 3–5 concrete acceptance criteria you would require before launch.

0

1

Updated 2026-02-06

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models

Related