Evaluating LLM Safety Measures Post-Anonymization
A company has trained a large language model on a dataset of internal corporate documents. They ran a script to remove all employee names and project codenames. However, during testing, they find the model can still inadvertently reveal sensitive strategic information when prompted about quarterly goals, as the context around these goals can implicitly point to the redacted project codenames. The company needs to deploy the model soon and cannot afford the time or cost to retrain it from scratch.
Given these constraints, evaluate the following two proposals and determine which is the more effective and practical immediate solution to mitigate the risk. Justify your reasoning.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Evaluating LLM Safety Measures Post-Anonymization
A development team is building a large language model and has meticulously removed all direct personal identifiers (names, phone numbers, addresses) from its massive training dataset. Despite this effort, they discover during red-teaming that the model can still reconstruct sensitive, context-specific information about individuals when given very specific and unusual prompts. Which of the following statements best analyzes this situation?
Assessing Anonymization Sufficiency