Comparison

Comparison of Training Implementations for Multi-Round Dialogue Models

When training multi-round dialogue models, there are two main implementation strategies. A straightforward but inefficient approach involves performing KK separate forward passes for a KK-turn conversation. In each pass, the model makes a prediction using an incrementally longer conversational history. In contrast, a more computationally efficient method involves concatenating the entire dialogue into a single sequence, which allows for the calculation of the loss for all responses in a single run of the Large Language Model.

Image 0

0

1

Updated 2026-05-01

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences