Multiple Choice

A team is training a conversational agent to be more helpful. Their strategy involves having a human user interact with the agent. After each response from the agent, the human provides a numerical score indicating its quality. This score is immediately used as a signal to update the agent's internal strategy for generating the next response. This direct-feedback loop is repeated thousands of times. The team observes that this training process is prohibitively slow and costly. Based on the typical two-stage process for this kind of training, what is the most significant flaw in the team's approach?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science