1Cademy - Comparison of Training Implementations for Multi-Round Dialogue Models

Learn Before

Training Objective for Multi-Round Dialogue Models

Comparison

Comparison of Training Implementations for Multi-Round Dialogue Models

When training multi-round dialogue models, there are two main implementation strategies. A straightforward but inefficient approach involves performing $K$ separate forward passes for a $K$ -turn conversation. In each pass, the model makes a prediction using an incrementally longer conversational history. In contrast, a more computationally efficient method involves concatenating the entire dialogue into a single sequence, which allows for the calculation of the loss for all responses in a single run of the Large Language Model.