Interpreting Model Output Probabilities
Based on the descriptions provided, contrast the meaning and purpose of the output probability distribution calculated for a word at a specific position in Model A versus Model B. Explain how the information each model uses to make its calculation leads to this difference.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Interpreting Model Output Probabilities
An engineer is working with two different text-processing systems. System A generates a story one word at a time. To choose the word at position i, it calculates a probability distribution over the vocabulary based only on the words from position 1 to i-1. System B is used for a fill-in-the-blank task. Given a sentence with a missing word at position i, it calculates a probability distribution for that position using all other words in the sentence (both before and after position i) as context. Which statement best analyzes the meaning of the probability distributions in these two systems?
Consider a model tasked with predicting a masked word within a complete sentence by looking at all surrounding words. The probability distribution calculated for this masked position has the same fundamental interpretation as the distribution from a model that generates a sentence one word at a time, where each new word is predicted based only on the words that came before it.