Formula

Dataset-Level Objective for Multi-Round Conversational Models

The training of a multi-round conversational model can be achieved by maximizing the log-likelihood over an entire training dataset D\mathcal{D}. Because the loss for generating user inputs is masked (set to 0{}0), the sum of the conditional log-probabilities of the model's responses is mathematically equal to the log-probability of the entire concatenated conversational sequence, logPrθ(seq)\log \mathrm{Pr}_{\theta}(\mathrm{seq}). Therefore, the training objective across the dataset is formulated as: θ~=arg maxθseqDlogPrθ(seq)\tilde{\theta} = \argmax_{\theta} \sum_{\mathrm{seq} \in \mathcal{D}} \log \mathrm{Pr}_{\theta}(\mathrm{seq}). This formulation simplifies the implementation of multi-round Supervised Fine-Tuning (SFT) by treating it fundamentally the same as standard language model training on independent sequences.

0

1

Updated 2026-05-01

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences