Comparison

Comparison of Position Offsets in Causal vs. Bidirectional Attention

The range of the relative position offset, i - j, depends on the type of attention mechanism being used. In causal attention, which is standard for language modeling, a query at position i can only attend to its left-context (positions j where j ≤ i). This constraint ensures the offset i - j is always non-negative. In contrast, general or bidirectional self-attention allows a token to attend to the entire sequence, which includes positions where j > i, thereby permitting negative offsets.

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences