1Cademy - Consequences of the AI Black Box

Learn Before

The Difficulty of Disability Fairness in Relation to AI

Idea

Consequences of the AI 'Black Box'

By design, many AI models are a 'black box', meaning that even those who created the AI models don't know exactly how or why they create the outputs that they do. A study found that AI models that were trained to act maliciously under certain conditions could not have these malicious behaviors removed even after using standard safety training techniques.

Potentially, AI models with biased data may not be able to remove the biased outputs unless they start over completely from scratch without biased data. However, if biased data is inserted into the AI model, that could continuously poison the model's output and cause difficulty in removing that bias from future results.

0

1

Updated 2024-01-27

Contributors are:

Sharon Lin

🏆 2

Who are from:

University of Michigan - Ann Arbor

🏆 2

References

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Learn Before

Related