Learn Before
Example of a Token Sequence
A token sequence is a fundamental data structure in language modeling where a sentence is broken down into individual units, or tokens. Positional information is often included to indicate the order of the tokens. For example, the sentence 'The kitten is chasing the ball .' can be represented as a sequence of tokens with their positions marked by subscripts: 'The₁ kitten₂ is₃ chasing₄ the₅ ball₆ .'.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Pre-training Objective for Language Models
Example of a Token Sequence
Example of an Indexed Token Sequence
A language model is evaluated on a sequence of four tokens,
(x_0, x_1, x_2, x_3). The model's performance is measured by calculating a loss value at each step of the sequence generation. The individual losses are as follows: the loss for predicting tokenx_1is 1.2, the loss for predictingx_2is 0.5, and the loss for predictingx_3is 2.3. Based on this information, what is the total loss for the entire token sequence?Comparative Model Performance Analysis
A language model's performance is being evaluated on the token sequence
('The', 'cat', 'sat', 'on'). The total loss for this sequence is calculated by summing the individual losses from each predictive step. Which of the following sets of predictions contributes to this total loss calculation?Ground-Truth Distribution as a One-Hot Representation
Learn After
Special Tokens in Language Models
A language model processes text by breaking it into an ordered sequence of tokens, where each token is a unit of text (like a word or punctuation mark) with an associated position. Consider the following two sentences:
Sentence A: 'The fast car races.' Sentence B: 'The fast cars race.'
Which of the following options most accurately represents the distinct token sequences for these two sentences as a typical tokenizer would produce them?
A language model processes text by breaking it down into an ordered sequence of tokens. Arrange the following tokens to reconstruct the original sentence: 'The model predicts the next word .'
Representing Text as a Token Sequence