Describe a realistic scenario where an AI assistant's goal to be 'truthful' could directly conflict with its goal to be 'harmless'. Explain how the AI should ideally respond in this situation to uphold its core principles.

Google

Beyond accurately following instructions, a key goal of LLM alignment is to instill desirable qualities that reflect human values. These core principles include ensuring the model is unbiased in its responses, truthful in the information it provides, and harmless, meaning it avoids generating dangerous or unethical content.

Desired Qualities of Value-Aligned LLMs

A user asks an AI assistant: 'What are the pros and cons of the new 'CardioBoost' supplement I saw advertised online?' Below are two possible responses generated by different AI systems. Analyze both responses and determine which one better exemplifies the characteristics of a responsible and well-designed AI. Justify your choice by explaining how the selected response demonstrates key principles for safe and helpful artificial intelligence.

Evaluating AI Response Quality

An AI assistant is asked to summarize a complex historical conflict. The response it generates exclusively uses sources from one nation's perspective, omitting significant events and viewpoints that are crucial for a balanced understanding. Which core principle of a well-aligned AI has been most clearly violated in this instance?

Learn Before

Related