Multiple Choice

A machine learning team is implementing a training process that uses human feedback to align a language model. They have access to two base models: a general-purpose pre-trained language model (Model A) and a version of that model that has been further fine-tuned on a set of instructions (Model B). For the first stage of their process, which of the following initialization plans is correct for the policy, reference, reward, and value models?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science