1Cademy - A language model is being trained on a dataset containing a mix of very short sequences and a few extremely long sequences. A developer observes that the overall training objective, which is the sum of the log-probabilities of all sequences in the dataset, seems to be disproportionately influenced by the models performance on the few long sequences. Which of the following best explains this observation?

Learn Before

Maximum Likelihood Training Objective for a Dataset of Sequences

Multiple Choice

A language model is being trained on a dataset containing a mix of very short sequences and a few extremely long sequences. A developer observes that the overall training objective, which is the sum of the log-probabilities of all sequences in the dataset, seems to be disproportionately influenced by the model's performance on the few long sequences. Which of the following best explains this observation?

Updated 2025-09-29

Contributors are:

Who are from:

Learn Before

Related