Short Answer

Deconstructing Next-Token Prediction

An autoregressive language model is in the process of generating a response. So far, it has produced the sequence of tokens: ['The', 'cat', 'sat', 'on']. The model now needs to calculate the probability for the next token. Using the formula Prθ^(xi+1x0,...,xi)Pr_{\hat{\theta}}(x_{i+1}|x_0, ..., x_i), identify what the terms xi+1x_{i+1} and (x0,...,xi)(x_0, ..., x_i) represent in this specific scenario.

0

1

Updated 2025-10-04

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science