Multiple Choice

When training a multi-round dialogue model, an engineer chooses to concatenate an entire K-turn conversation into a single sequence and calculate the loss for all K responses in one forward pass. What is the primary advantage of this approach compared to performing K separate forward passes, where each pass processes an incrementally longer conversational history?

0

1

Updated 2025-10-05

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science