Formula

Training Objective for Multi-Round Dialogue Models

The training objective for a multi-round dialogue model is to find the optimal parameters, θ~\tilde{\theta}, that maximize the cumulative log-probability of all model responses across the entire conversation. For a KK-round dialogue, this is achieved by summing the conditional log-probabilities for each response, where each response is conditioned on the full conversational history up to that point. The formal optimization problem is: θ~=argmaxθk=1KlogPrθ(ykx1,y1,,xk)\tilde{\theta} = \arg\max_{\theta} \sum_{k=1}^{K} \log \mathrm{Pr}_{\theta}(\mathbf{y}^k|\mathbf{x}^1, \mathbf{y}^1, \dots, \mathbf{x}^k). A straightforward implementation of this objective involves calculating the conditional probability for each of the KK turns separately, but it requires running the language model KK times.

Image 0

0

1

Updated 2026-05-01

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related