1Cademy - A team is preparing a large, diverse text dataset to train a powerful new language model. To improve the final models quality, they first use a smaller, pre-existing language model to score each document in the dataset. Documents that receive a very low score from this smaller model are removed. Which of the following documents is most likely to be removed from the dataset during this filtering process?

Learn Before

Data Selection and Filtering Using Weak Models

Multiple Choice

A team is preparing a large, diverse text dataset to train a powerful new language model. To improve the final model's quality, they first use a smaller, pre-existing language model to score each document in the dataset. Documents that receive a very low score from this smaller model are removed. Which of the following documents is most likely to be removed from the dataset during this filtering process?

Updated 2025-10-05

Contributors are:

Who are from:

Learn Before

Related