Learn Before
Policy in the Context of LLMs
For a Large Language Model (LLM), the policy, denoted as , represents the probability distribution over the vocabulary of possible next tokens, conditioned on the preceding sequence of tokens which constitute the context. In essence, it is the strategy the LLM uses to decide which token to generate next.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.4 Alignment - Foundations of Large Language Models
Related
Policy in the Context of LLMs
Objective Function as Expected Cumulative Reward (Performance Function)
Policy in the Context of LLMs
LLM Policy as a Probability Distribution
Identifying the Agent and Action in a Training Scenario
When a language model is fine-tuned using a system that incorporates human preferences, this process is often conceptualized within a reinforcement learning framework. Which of the following statements correctly analyzes the components of this interaction?
When training a language model using a framework that incorporates human feedback, standard reinforcement learning terminology is used. Match each reinforcement learning term on the left with its corresponding component or concept in this specific language model training context on the right.
Learn After
Policy Formula for LLMs in Reinforcement Learning
An autoregressive language model has processed the input 'The cat sat on the' and is now deciding the next word to generate. At this specific step, which of the following best describes the model's 'policy'?
Analyzing Language Model Generation Strategies
Nature of an LLM's Policy