1Cademy - Dataset-Level Objective for Multi-Round Conversational Models

Learn Before

Formula

Dataset-Level Objective for Multi-Round Conversational Models

The training of a multi-round conversational model can be achieved by maximizing the log-likelihood over an entire training dataset $\mathcal{D}$ . Because the loss for generating user inputs is masked (set to ${}0$ ), the sum of the conditional log-probabilities of the model's responses is mathematically equal to the log-probability of the entire concatenated conversational sequence, $\log \mathrm{Pr}_{\theta}(\mathrm{seq})$ . Therefore, the training objective across the dataset is formulated as: $\tilde{\theta} = \argmax_{\theta} \sum_{\mathrm{seq} \in \mathcal{D}} \log \mathrm{Pr}_{\theta}(\mathrm{seq})$ . This formulation simplifies the implementation of multi-round Supervised Fine-Tuning (SFT) by treating it fundamentally the same as standard language model training on independent sequences.

0

1

Updated 2026-05-01

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related