1Cademy - A dialogue model is trained by processing entire multi-turn conversations as single, concatenated sequences of text. To make this process efficient, the training loss is calculated based only on the models ability to predict certain parts of the sequence, while the log-probabilities of other parts are ignored. Given the following two-turn conversation, which parts of the sequence would be used to calculate the training loss? * Turn 1 (User): What is the weather like * Turn 1 (Model): In which city? * Turn 2 (User): In London * Turn 2 (Model): It is currently raining.

Learn Before

Log-Probability Decomposition for Efficient Multi-Turn Dialogue Training

Multiple Choice

A dialogue model is trained by processing entire multi-turn conversations as single, concatenated sequences of text. To make this process efficient, the training loss is calculated based only on the model's ability to predict certain parts of the sequence, while the log-probabilities of other parts are ignored. Given the following two-turn conversation, which parts of the sequence would be used to calculate the training loss?

Turn 1 (User): 'What is the weather like'
Turn 1 (Model): 'In which city?'
Turn 2 (User): 'In London'
Turn 2 (Model): 'It is currently raining.'

Updated 2025-10-02

Contributors are:

Who are from:

Learn Before

Related