Short Answer

Analyzing Transformer Model Output

A data scientist is debugging a Transformer-based language model. The model has a vocabulary of 50,000 unique words. After feeding the model an input sentence of 20 words, the scientist claims that the final probability-generating layer should output a single probability distribution over the 20 words that were in the input sentence. Identify the two fundamental errors in the scientist's claim and briefly explain the correct nature of the output.

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science