Learn Before
Nature of an LLM's Policy
Explain why the strategy a language model uses to select the next token is described as a 'policy' that is a probability distribution over its entire vocabulary, rather than a function that simply selects a single, predetermined 'best' token.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.4 Alignment - Foundations of Large Language Models
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Policy Formula for LLMs in Reinforcement Learning
An autoregressive language model has processed the input 'The cat sat on the' and is now deciding the next word to generate. At this specific step, which of the following best describes the model's 'policy'?
Analyzing Language Model Generation Strategies
Nature of an LLM's Policy