Learn Before
Analyzing Transformer Model Output
A data scientist is debugging a Transformer-based language model. The model has a vocabulary of 50,000 unique words. After feeding the model an input sentence of 20 words, the scientist claims that the final probability-generating layer should output a single probability distribution over the 20 words that were in the input sentence. Identify the two fundamental errors in the scientist's claim and briefly explain the correct nature of the output.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Output Probability Calculation in Transformer Language Models
A language model based on a standard multi-layer architecture is given an input sequence of 15 words. The model's vocabulary consists of 30,000 unique words. After processing the input through all its layers, what is the nature of the final output generated by the model's terminal probability-calculating layer for this sequence?
Analyzing Transformer Model Output
Analyzing a Language Model's Output Layer