1Cademy - A development team is building a system to align a large language model using reinforcement learning from human feedback. Their setup includes a target model for text generation, a reference model, a reward model to score outputs based on human preferences, and a value model to predict future rewards. For computational efficiency, they decide to build the reward model using a Convolutional Neural Network (CNN) and the value model using a Recurrent Neural Network (RNN), while keeping the target and reference models as Transformer decoders. What is the most significant architectural inconsistency in this design compared to a standard implementation?

Learn Before

Architectural Components of an RLHF System

Multiple Choice

A development team is building a system to align a large language model using reinforcement learning from human feedback. Their setup includes a target model for text generation, a reference model, a reward model to score outputs based on human preferences, and a value model to predict future rewards. For computational efficiency, they decide to build the reward model using a Convolutional Neural Network (CNN) and the value model using a Recurrent Neural Network (RNN), while keeping the target and reference models as Transformer decoders. What is the most significant architectural inconsistency in this design compared to a standard implementation?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related