A language model computes probability distributions for a sequence of tokens x using a two-stage process: an encoder with parameters θ generates representations, which are then passed to a Softmax layer with a weight matrix W. This model is consistently outputting a nearly uniform probability distribution for every token position, meaning every word in the vocabulary is considered almost equally likely, regardless of the input. Which of the following is the most direct and plausible explanation for this behavior?
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Simplified Notation for Parameterized Models
Comparison of Output Probability Meaning: Language Modeling vs. Encoder Pre-training
A language model computes probability distributions for a sequence of tokens
xusing a two-stage process: an encoder with parametersθgenerates representations, which are then passed to a Softmax layer with a weight matrixW. This model is consistently outputting a nearly uniform probability distribution for every token position, meaning every word in the vocabulary is considered almost equally likely, regardless of the input. Which of the following is the most direct and plausible explanation for this behavior?Evaluating Component Independence in a Language Model
A language model calculates the probability distribution for each token in an input sequence,
x, by first generating a sequence of numerical representations and then applying a final transformation. Arrange the following steps in the correct computational order to produce the probability vector,p_i, for the token at a specific positioni.