Case Study

Design a Correct Sequence-Scoring Function for Autoregressive LLM Outputs

You are building an internal evaluation service that ranks multiple candidate completions produced by an autoregressive LLM for the same prompt. The model returns, for each generation step i, a vector of logits u_i over the vocabulary V (one logit per token) computed from the context (prompt x plus previously generated tokens y_<i). Your service must (1) compute the conditional probability of each chosen token y_i from u_i, (2) compute the total conditional log-probability log Pr(y|x) for the full candidate sequence y = (y_1,...,y_n) using the autoregressive decomposition, and (3) return the best candidate y_hat = argmax_y log Pr(y|x).

Create a precise, implementation-ready specification (math + clear pseudocode) for a function score_and_select(prompt_tokens x, candidates Y, logits_by_candidate U) that returns (best_candidate, scores). Your spec must explicitly show: how Softmax converts logits to next-token probabilities; how you extract Pr(y_i|x,y_<i) for the actually generated token at each step; how you aggregate across steps into a single sequence score consistent with the inference objective; and how this relates to the log-likelihood objective used in training (i.e., what quantity training maximizes that your scorer is computing at inference). Assume is a fixed start token with probability 1 and candidates may have different lengths; include how you handle length in the score (e.g., stop at if present).

0

1

Updated 2026-02-06

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.1 Pre-training - Foundations of Large Language Models

Ch.5 Inference - Foundations of Large Language Models

Data Science

Related