Short Answer

Evaluating Component Independence in a Language Model

A fellow student examines the formula for computing token probabilities, [p_1, ..., p_m] = Softmax_W(Encoder_theta(x)), and claims that the probability distribution for the token at position i, p_i, is calculated based only on the representation of the input token at that same position, x_i. Critically evaluate this claim. Is it correct? Justify your reasoning based on the function of the components in the formula.

0

1

Updated 2025-10-04

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science