Learn Before
An engineering team is refining a large language model. During one step of the process, the model is given the start of a sentence, 'The best way to learn a new skill is'. The model then calculates a probability for every word in its vocabulary to be the next word and uses this distribution to generate a complete sentence. The model's internal parameters are then updated based on a separate quality assessment of the generated sentence. Which part of this process best describes the primary role of the model being actively trained (the 'policy model')?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An engineering team is refining a large language model. During one step of the process, the model is given the start of a sentence, 'The best way to learn a new skill is'. The model then calculates a probability for every word in its vocabulary to be the next word and uses this distribution to generate a complete sentence. The model's internal parameters are then updated based on a separate quality assessment of the generated sentence. Which part of this process best describes the primary role of the model being actively trained (the 'policy model')?
Mechanism of Policy Model Refinement
Analyzing Policy Model Behavior