Learn Before
Analyzing a Processed Data Batch
A language model is given the following two sentences to process as a single batch:
- Sentence 1:
['The', 'cat', 'sat'] - Sentence 2:
['A', 'dog', 'slept', 'soundly']
After preparation for the model, the batch looks like this:
- Processed 1:
['⟨SOS⟩', 'The', 'cat', 'sat', '⟨pad⟩'] - Processed 2:
['⟨SOS⟩', 'A', 'dog', 'slept', 'soundly']
Explain the purpose of the two new tokens, ⟨SOS⟩ and ⟨pad⟩, that were added during this preparation step.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model needs to process a group of sentences simultaneously. For computational efficiency, all input sequences in the group must be the same length. This is achieved by adding a special, non-word token to the end of any shorter sequences. Given the two tokenized sentences below, which option correctly demonstrates this preparation process?
Sentence A:
['The', 'quick', 'fox'](length 3) Sentence B:['A', 'lazy', 'dog', 'sleeps'](length 4)A language model is being trained for text generation. During training, it learns from examples where each target sentence is represented as a sequence of tokens. When tested, the model successfully begins generating text but then fails to stop, producing an endless stream of words. Based on this specific failure, which essential structural token was most likely omitted from the end of each target sentence in the training data?
Analyzing a Processed Data Batch