1Cademy - A machine learning engineer is training a language model for a question-answering task. The training data consists of concatenated `[question, answer]` sequences. Due to a configuration error, the training loss is calculated across all tokens in the sequence (both question and answer), instead of only on the answer tokens. What is the most likely and significant negative consequence of this misconfiguration on the models behavior?

Learn Before

Selective Loss Computation in Joint Probability Language Modeling

Multiple Choice

A machine learning engineer is training a language model for a question-answering task. The training data consists of concatenated [question, answer] sequences. Due to a configuration error, the training loss is calculated across all tokens in the sequence (both question and answer), instead of only on the answer tokens. What is the most likely and significant negative consequence of this misconfiguration on the model's behavior?

Updated 2025-10-08

Contributors are:

Who are from:

Learn Before

Related