Training Objective for Multi-Round Dialogue Models
The training objective for a multi-round dialogue model is to find the optimal parameters, , that maximize the cumulative log-probability of all model responses across the entire conversation. For a -round dialogue, this is achieved by summing the conditional log-probabilities for each response, where each response is conditioned on the full conversational history up to that point. The formal optimization problem is: . A straightforward implementation of this objective involves calculating the conditional probability for each of the turns separately, but it requires running the language model times.

0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Comparison of Single-Round vs. Multi-Round Prediction Problems
Healthcare Assistant Chatbot as a Multi-Round Prediction Problem
Training Objective for Multi-Round Dialogue Models
Conditional Log-Probability of a Response in Multi-Round Dialogue
A user is interacting with a language model to plan a vacation. Analyze the following conversation:
Turn 1:
- User: "I want to book a flight to a warm destination for December."
- Model: "That sounds lovely! To help you, could you tell me which continent you're interested in?"
Turn 2:
- User: "Let's focus on South America."
- Model: "Excellent choice for December. Based on that, I recommend Brazil or Colombia. Do you have a preference?"
To generate its response in Turn 2, which of the following sets of information must the model have processed to ensure its suggestion is both relevant and coherent?
Analysis of a Conversational Failure
A user is interacting with a customer support model for an e-commerce site. Consider the following two-turn conversation:
Turn 1:
- User: "Hi, I ordered a blue t-shirt last week, order #12345. The tracking says it was delivered, but I haven't received it."
- Model: "I'm sorry to hear that. Let me check the details for order #12345. I see it was marked as delivered two days ago. Could you please confirm your shipping address is 123 Main St, Anytown?"
Turn 2:
- User: "Yes, that's the correct address. What should I do next?"
Given this history, which of the following responses from the model would be the most effective and contextually appropriate for the next turn?
Training Objective for Multi-Round Dialogue Models
Consider a dialogue model engaged in a three-turn conversation, represented by the sequence
{x_1, y_1, x_2, y_2, x_3, y_3}, wherex_kis the user's input andy_kis the model's response at turnk. When the model calculates the probability of generating the third response (y_3), how does the set of information it conditions on relate to the set of information used to calculate the probability of the second response (y_2)?Formulating Conditional Probability in Dialogue
Diagnosing Conversational Memory Failure
Learn After
Comparison of Training Implementations for Multi-Round Dialogue Models
A team is developing a chatbot for multi-turn conversations. They have a dataset of K-round dialogues, each consisting of a sequence of user inputs (x^k) and desired model responses (y^k). To train the model, they must define an objective function that the model's parameters will be optimized to maximize. Which of the following objective functions correctly represents the goal of making the model generate the entire sequence of desired responses accurately within the conversational context?
Diagnosing Context-Loss in a Dialogue Model
Rationale for Cumulative Objective in Dialogue Models
Dataset-Level Objective for Multi-Round Conversational Models