Learn Before
Multiple Choice

A language model is tasked with predicting the next word for the sequence 'The cat sat on the'. After processing this input, the model's final linear layer produces a vector with 50,257 raw numerical scores, one for each word in its vocabulary. Which statement best characterizes this vector of raw scores, just before any final normalization function (like Softmax) is applied?

0

1

Updated 2025-09-28

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science