1Cademy - A language model is tasked with generating a sentence. After producing the partial sequence The cat sat on the, it computes the following probability distribution for the next word: {mat: 0.7, chair: 0.2, roof: 0.1}. If we frame this generation process using reinforcement learning, how is this probability distribution correctly interpreted?

Learn Before

Language Model as a Stochastic Policy

Multiple Choice

A language model is tasked with generating a sentence. After producing the partial sequence 'The cat sat on the', it computes the following probability distribution for the next word: {'mat': 0.7, 'chair': 0.2, 'roof': 0.1}. If we frame this generation process using reinforcement learning, how is this probability distribution correctly interpreted?

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related