1Cademy - When training a multi-round dialogue model, an engineer chooses to concatenate an entire K-turn conversation into a single sequence and calculate the loss for all K responses in one forward pass. What is the primary advantage of this approach compared to performing K separate forward passes, where each pass processes an incrementally longer conversational history?

Learn Before

Comparison of Training Implementations for Multi-Round Dialogue Models

Multiple Choice

When training a multi-round dialogue model, an engineer chooses to concatenate an entire K-turn conversation into a single sequence and calculate the loss for all K responses in one forward pass. What is the primary advantage of this approach compared to performing K separate forward passes, where each pass processes an incrementally longer conversational history?

Updated 2025-10-05

Contributors are:

Who are from:

Learn Before

Related