1Cademy - A development team aims to align a large language model with the complex value of being helpful. Their strategy is to create a high-quality dataset of 50,000 question-and-answer pairs where the models response is rated as very helpful by human annotators. They then fine-tune the model with the sole objective of maximizing its ability to reproduce these exact very helpful answers. Which statement best evaluates the fundamental limitation of this data-fitting approach for achieving the teams goal?

Learn Before

Insufficiency of Data Fitting for Complex Value Alignment

Multiple Choice

A development team aims to align a large language model with the complex value of 'being helpful'. Their strategy is to create a high-quality dataset of 50,000 question-and-answer pairs where the model's response is rated as 'very helpful' by human annotators. They then fine-tune the model with the sole objective of maximizing its ability to reproduce these exact 'very helpful' answers. Which statement best evaluates the fundamental limitation of this data-fitting approach for achieving the team's goal?

Updated 2025-10-05

Contributors are:

Who are from:

Learn Before

Related