Multiple Choice

A team is improving a text-generation model. The process involves providing the model with an input prompt, to which the model generates a textual response. A human evaluator then assigns a numerical score to this response based on its quality. This score is used to adjust the model's behavior for future responses. If this entire process is described using the framework of a system learning from sequential decisions, what component of the text-generation process corresponds to the 'policy'?

0

1

Updated 2025-10-02

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Computing Sciences

Foundations of Large Language Models Course

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science