Multiple Choice

In a system designed to align a language model with human preferences, one component functions as a 'critic'. It takes the current state (e.g., a conversation history) as input and outputs a single scalar value predicting the total expected future rewards from that state. This component's architecture is often a large language model with a final linear layer for the scalar output. Which statement best distinguishes this specific component from others in the system?

0

1

Updated 2025-10-01

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science