Learn Before
Comparison of Prefix and Causal Language Modeling
Prefix Language Modeling (PrefixLM) and Causal Language Modeling (CLM) differ in how they process and generate text sequences. In CLM, the entire sequence is generated autoregressively, with each token being predicted based on all preceding tokens starting from the very first one. In contrast, PrefixLM uses a bidirectional encoder to process an initial prefix sequence all at once, creating a rich contextual representation. A decoder then autoregressively generates the remainder of the sequence, conditioned on this encoded prefix.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Comparison of Prefix and Causal Language Modeling
Example of Prefix Language Modeling Input Format
Training Encoder-Decoder Models with Prefix Language Modeling
Consider a model architecture composed of an encoder and a decoder, trained with a self-supervised objective to complete a text sequence given an initial prefix. Which statement best analyzes the distinct processing methods of the encoder and decoder for this task?
Processing a Text Sequence
In a self-supervised text generation task, a model is given an initial sequence of words (a prefix) and trained to produce the words that follow. For an architecture that uses two distinct components to accomplish this, match each component or data piece with its primary role or characteristic.
Example of Prefix Language Modeling
Learn After
A language model is given the prompt 'Despite the initial positive reviews, the film's box office performance was ultimately disappointing because...' and is tasked with generating a continuation. Consider two different ways the model could process this prompt before generating the next token:
- Method 1: When processing the prompt, the token 'disappointing' can directly see and incorporate information from the token 'positive' at the beginning of the sentence.
- Method 2: When processing the prompt, the token 'disappointing' can only see and incorporate information from the preceding tokens, such as 'ultimately' and 'was'.
Which of the following statements best analyzes the fundamental difference in how these two methods build an understanding of the prompt?
Architectural Implications for Prompt Comprehension
Architectural Choice for a Conversational AI