Learn Before
Special Tokens in Language Models
Language models utilize special tokens that do not represent words from the vocabulary but instead provide structural or control information. For instance, the ⟨SOS⟩ (Start of Sequence) token marks the beginning of an input, while the ⟨pad⟩ token is used to equalize sequence lengths within a batch for efficient processing. These tokens are essential for managing the input and output data streams of the model.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Special Tokens in Language Models
A language model processes text by breaking it into an ordered sequence of tokens, where each token is a unit of text (like a word or punctuation mark) with an associated position. Consider the following two sentences:
Sentence A: 'The fast car races.' Sentence B: 'The fast cars race.'
Which of the following options most accurately represents the distinct token sequences for these two sentences as a typical tokenizer would produce them?
A language model processes text by breaking it down into an ordered sequence of tokens. Arrange the following tokens to reconstruct the original sentence: 'The model predicts the next word .'
Representing Text as a Token Sequence
Learn After
A language model needs to process a group of sentences simultaneously. For computational efficiency, all input sequences in the group must be the same length. This is achieved by adding a special, non-word token to the end of any shorter sequences. Given the two tokenized sentences below, which option correctly demonstrates this preparation process?
Sentence A:
['The', 'quick', 'fox'](length 3) Sentence B:['A', 'lazy', 'dog', 'sleeps'](length 4)A language model is being trained for text generation. During training, it learns from examples where each target sentence is represented as a sequence of tokens. When tested, the model successfully begins generating text but then fails to stop, producing an endless stream of words. Based on this specific failure, which essential structural token was most likely omitted from the end of each target sentence in the training data?
Analyzing a Processed Data Batch