Learn Before
Analyzing a Language Model's Sequence Probability
Based on the information provided in the case study, calculate the probability of the first token, Pr(<s>). Explain what this specific probability value implies about the nature and function of the <s> token within this model's architecture.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Log-Likelihood Objective for Language Model Training
A language model calculates the joint probability of a sequence of tokens
(x_0, x_1, ..., x_m). The first token,x_0, is a special, deterministic start-of-sequence symbol. How does the nature of this specific first token typically affect the overall calculation of the sequence's joint probability?Calculating Sequence Probability with a Start Token
Analyzing a Language Model's Sequence Probability