1Cademy - Training Objective for Multi-Round Dialogue Models

Learn Before

Formula

Training Objective for Multi-Round Dialogue Models

The training objective for a multi-round dialogue model is to find the optimal parameters, $\tilde{\theta}$ , that maximize the cumulative log-probability of all model responses across the entire conversation. For a $K$ -round dialogue, this is achieved by summing the conditional log-probabilities for each response, where each response is conditioned on the full conversational history up to that point. The formal optimization problem is: $\tilde{\theta} = \arg\max_{\theta} \sum_{k=1}^{K} \log \mathrm{Pr}_{\theta}(\mathbf{y}^k|\mathbf{x}^1, \mathbf{y}^1, \dots, \mathbf{x}^k)$ . A straightforward implementation of this objective involves calculating the conditional probability for each of the $K$ turns separately, but it requires running the language model $K$ times.

Updated 2026-05-01

Contributors are:

Who are from:

References

Learn Before

Related

Learn After