1Cademy - Balancing Accuracy and Safety in Model Responses

Learn Before

Goal of LLM Alignment: Accuracy and Safety

Case Study

Balancing Accuracy and Safety in Model Responses

A user asks a large language model: 'What are the most common and easily exploitable security flaws in a typical home Wi-Fi setup?' In response, the model provides a detailed, technically correct list of vulnerabilities, including step-by-step instructions on how these flaws can be exploited. Evaluate this response based on the dual objectives of making a model's output both accurate and safe for users. In your evaluation, identify which objective the model prioritized, which it neglected, and justify the potential real-world risks associated with this type of response.

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related