Causal Attention Mechanism
The causal attention mechanism computes an output for a specific position i in a sequence by considering only the elements up to and including that position. The formula is given by: In this equation, the output is a weighted sum of the value vectors () from the beginning of the sequence up to the current position i. The weights, , determine the importance of each value vector to the query vector . This unidirectional constraint is crucial for autoregressive tasks, as it prevents the model from attending to future tokens.

0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Architectural Differences Between Sequence Encoding and Generation Models
BERT (Bidirectional Encoder Representations from Transformers)
Fine-tuning for Sequence Encoding Models
Role of Encoders as Components in NLP Systems
Input and Output of a Sequence Encoder
Causal Attention Mechanism
Pre-train and Fine-tune Paradigm for Encoder Models
An engineer is building a system to automatically categorize customer reviews as 'positive' or 'negative'. The first component of their system must read the raw text of a review and convert it into a single, fixed-size numerical vector that captures the overall sentiment and meaning. This vector will then be fed into a separate classification component. Which of the following best describes the function of this first component?
A company develops a sophisticated model that takes a user's question as input and produces a detailed numerical representation that captures the question's full meaning. This model, by itself, is sufficient to function as a complete question-answering system.
The Role of Sequence Encoding in Text-Based Prediction