Learn Before
Start of Sentence (SOS) Token
The Start of Sentence (SOS) token is a special symbol used in language modeling to indicate the beginning of a text sequence. It is commonly denoted as <s> or <SOS>.

0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Types of Language Models
Evaluating language models
Shannon's Foundational Work on Language Modeling
Generalization of the Language Modeling Concept
Chain Rule for Sequence Probability
Deep Learning Approach to Language Modeling
Output Token Sequence in LLMs
Start of Sentence (SOS) Token
[CLS] Token as a Start Symbol
A system is designed to predict the probability of a sequence of words. For the sequence 'The dog ran', the system provides the following conditional probabilities:
- The probability of 'The' occurring at the start of a sequence is 0.2.
- The probability of 'dog' occurring after 'The' is 0.3.
- The probability of 'ran' occurring after 'The dog' is 0.7.
Based on the fundamental principle used by such systems to determine the likelihood of a full sequence, what is the overall probability of the sequence 'The dog ran'?
Analyzing Language Model Probability Assignments
A system's primary goal is to predict the probability of a sequence of tokens. To calculate the total probability for the sequence 'The quick brown fox', it breaks the problem down into a series of conditional probability calculations. Arrange the following calculations in the correct order that the system would use to find the total probability of the sequence.
Evaluating a Language Model's Probabilistic Output
Learn After
A language model is designed to calculate the probability of a sentence by multiplying the conditional probabilities of each word given the words that came before it. For the sentence 'The cat sat', this would be calculated as P('The') * P('cat' | 'The') * P('sat' | 'The cat'). What is the fundamental problem with calculating the probability of the very first word, 'The', in this specific manner?
Applying the Start of Sequence Token
A language model is tasked with calculating the probability of the sentence 'The quick brown fox'. Using the chain rule of probability and a special start-of-sentence token denoted as
<s>, how would the model correctly formulate this calculation?