1Cademy - In a typical system for aligning a language model with human feedback, it is common practice to use a Transformer-based architecture for the text-generating models, while employing simpler, non-Transformer architectures for the reward and value models to reduce computational overhead.

Learn Before

Architectural Components of an RLHF System

True/False

In a typical system for aligning a language model with human feedback, it is common practice to use a Transformer-based architecture for the text-generating models, while employing simpler, non-Transformer architectures for the reward and value models to reduce computational overhead.

Updated 2025-10-10

Contributors are: