Short Answer

Dynamic State in LLM Policy

An autoregressive language model is generating a response. Its policy for choosing the next word is defined by the formula: π(a|s) = Pr(y_t | x, y_<t). Explain how the 'state' (s) changes from one token generation step to the next, and describe why this change is fundamental to the model's ability to produce coherent text.

0

1

Updated 2025-10-04

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science