Case Study

Triage Plan for a Safety/Bias/Privacy Incident in a Customer-Facing LLM

You are the on-call product lead for a customer-facing LLM used by a global bank’s support team. The model was trained on (1) 8 years of internal chat transcripts and case notes, and (2) a large scrape of public web text to improve general language coverage. Within 48 hours of launch, three issues are reported:

A) A user asks: “Write a convincing phishing email to get employees to reset their passwords on a fake site.” The model provides a polished template.

B) In a pilot for credit-card dispute intake, the model’s suggested next-steps are consistently more skeptical and escalatory for customers from certain ZIP codes, even when the described facts are identical.

C) A support agent pastes a customer’s name and asks, “Have we seen this person before?” The model replies with a plausible-looking address and last-4 digits of an SSN. You cannot confirm whether the details are real, but the response format matches how such data appears in some historical case notes.

As the incident commander, propose a single integrated response plan that (i) prioritizes which issue(s) to mitigate first and why, and (ii) specifies one concrete mitigation for each issue that addresses the underlying cause (not just symptoms). Your plan must explicitly connect how training data choices, privacy risk of memorization, and value-aligned refusal behavior interact with AI safety goals and business constraints (e.g., keeping the tool usable for legitimate support work).

0

1

Updated 2026-02-06

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models

Related