Dataset-Level Objective for Multi-Round Conversational Models
The training of a multi-round conversational model can be achieved by maximizing the log-likelihood over an entire training dataset . Because the loss for generating user inputs is masked (set to ), the sum of the conditional log-probabilities of the model's responses is mathematically equal to the log-probability of the entire concatenated conversational sequence, . Therefore, the training objective across the dataset is formulated as: . This formulation simplifies the implementation of multi-round Supervised Fine-Tuning (SFT) by treating it fundamentally the same as standard language model training on independent sequences.
0
1
Tags
Foundations of Large Language Models
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Comparison of Training Implementations for Multi-Round Dialogue Models
A team is developing a chatbot for multi-turn conversations. They have a dataset of K-round dialogues, each consisting of a sequence of user inputs (x^k) and desired model responses (y^k). To train the model, they must define an objective function that the model's parameters will be optimized to maximize. Which of the following objective functions correctly represents the goal of making the model generate the entire sequence of desired responses accurately within the conversational context?
Diagnosing Context-Loss in a Dialogue Model
Rationale for Cumulative Objective in Dialogue Models
Dataset-Level Objective for Multi-Round Conversational Models
A dialogue model is trained by processing entire multi-turn conversations as single, concatenated sequences of text. To make this process efficient, the training loss is calculated based only on the model's ability to predict certain parts of the sequence, while the log-probabilities of other parts are ignored. Given the following two-turn conversation, which parts of the sequence would be used to calculate the training loss?
- Turn 1 (User): 'What is the weather like'
- Turn 1 (Model): 'In which city?'
- Turn 2 (User): 'In London'
- Turn 2 (Model): 'It is currently raining.'
Debugging a Dialogue Model Training Loop
Evaluating Dialogue Model Training Strategies
Dataset-Level Objective for Multi-Round Conversational Models