Reference

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Hubinger, E., Denison, C., Mu, J., Lambert, M., Tong, M., MacDiarmid, M., … Perez, E. (2024). Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training. arXiv [Cs.CR]. Retrieved from http://arxiv.org/abs/2401.05566

0

1

Updated 2024-01-27

Tags

Disability Studies

Social Science

Empirical Science

Science

Related