Multiple Choice

A development team is building a system to align a large language model using reinforcement learning from human feedback. Their setup includes a target model for text generation, a reference model, a reward model to score outputs based on human preferences, and a value model to predict future rewards. For computational efficiency, they decide to build the reward model using a Convolutional Neural Network (CNN) and the value model using a Recurrent Neural Network (RNN), while keeping the target and reference models as Transformer decoders. What is the most significant architectural inconsistency in this design compared to a standard implementation?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science