Multiple Choice

A language model computes probability distributions for a sequence of tokens x using a two-stage process: an encoder with parameters θ generates representations, which are then passed to a Softmax layer with a weight matrix W. This model is consistently outputting a nearly uniform probability distribution for every token position, meaning every word in the vocabulary is considered almost equally likely, regardless of the input. Which of the following is the most direct and plausible explanation for this behavior?

0

1

Updated 2025-10-02

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science