1Cademy - A team is training a model to predict user preferences between two generated text responses. The training objective is to minimize the average loss calculated over a collected dataset of preferences. However, the data collection was flawed, resulting in a dataset that primarily contains preferences from a very specific, non-representative group of users. What is the most significant risk of using the average loss on this particular dataset as the primary metric for training the model?

Learn Before

Approximating Expected Loss with Empirical Loss

Multiple Choice

A team is training a model to predict user preferences between two generated text responses. The training objective is to minimize the average loss calculated over a collected dataset of preferences. However, the data collection was flawed, resulting in a dataset that primarily contains preferences from a very specific, non-representative group of users. What is the most significant risk of using the average loss on this particular dataset as the primary metric for training the model?

Updated 2025-10-07

Contributors are:

Who are from:

Learn Before

Related