Learn Before
Applying the Start of Sequence Token
A language model calculates the probability of the sentence 'The cat sat' as the product of conditional probabilities: P('The') * P('cat' | 'The') * P('sat' | 'The cat'). This formulation is incomplete because it lacks a starting context for the first word. Rewrite this probability calculation to correctly incorporate a special token, denoted as <SOS>, that signals the beginning of the sequence.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model is designed to calculate the probability of a sentence by multiplying the conditional probabilities of each word given the words that came before it. For the sentence 'The cat sat', this would be calculated as P('The') * P('cat' | 'The') * P('sat' | 'The cat'). What is the fundamental problem with calculating the probability of the very first word, 'The', in this specific manner?
Applying the Start of Sequence Token
A language model is tasked with calculating the probability of the sentence 'The quick brown fox'. Using the chain rule of probability and a special start-of-sentence token denoted as
<s>, how would the model correctly formulate this calculation?