1Cademy - A research team is fine-tuning a very large, computationally expensive language model on a massive, noisy dataset. To optimize their limited budget, they first perform a single pass with the large model over the dataset to calculate the training loss for each data sample. They then train a much smaller, faster model to predict the loss values that the large model assigned. Finally, they use this trained small model to filter the dataset, keeping only the samples predicted to have high loss. Which statement best evaluates the effectiveness of this data selection strategy?

Learn Before

Small Model-Based Data Selection

Multiple Choice

A research team is fine-tuning a very large, computationally expensive language model on a massive, noisy dataset. To optimize their limited budget, they first perform a single pass with the large model over the dataset to calculate the training loss for each data sample. They then train a much smaller, faster model to predict the loss values that the large model assigned. Finally, they use this trained small model to filter the dataset, keeping only the samples predicted to have high loss. Which statement best evaluates the effectiveness of this data selection strategy?

Updated 2025-09-29

Contributors are:

Who are from:

Learn Before

Related