Learn Before
Computational Cost of Normalization
A machine learning model is designed to predict the next word in a sentence and has a vocabulary of 200,000 possible words. The model produces an unnormalized score for each of these words. Explain why calculating the exact normalization constant (the partition function) to turn these scores into a valid probability distribution is a significant computational challenge.
0
1
Tags
Data Science
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Pseudolikelihood
Log-Likelihood Gradient
Log-Likelihood Gradient
Normalizing Model Outputs
A model produces unnormalized scores for three possible outcomes: {Outcome A: 8, Outcome B: 10, Outcome C: 2}. To convert these scores into a valid probability distribution, a normalization constant must be calculated by summing all the unnormalized scores. What is the final, normalized probability for Outcome B?
Computational Cost of Normalization