1Cademy - A language model is being trained on instruction-following data. For one specific training instance, the model processes the full tokenized sequence: `[User:, What, is, 2+2?, Assistant:, 4]`. The goal is to train the model to provide the correct response (4) when given the users prompt. During the backpropagation step for this single instance, on which token(s) is the predictive loss calculated to update the models weights?

Learn Before

Selective Loss Computation in Joint Probability Language Modeling

Multiple Choice

A language model is being trained on instruction-following data. For one specific training instance, the model processes the full tokenized sequence: ['User:', 'What', 'is', '2+2?', 'Assistant:', '4']. The goal is to train the model to provide the correct response ('4') when given the user's prompt. During the backpropagation step for this single instance, on which token(s) is the predictive loss calculated to update the model's weights?

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related