1Cademy - Comparing Positional Bias Functions

Learn Before

Kerple

Short Answer

Comparing Positional Bias Functions

Consider two different methods for applying a positional penalty to the attention scores in a transformer model. Both penalties are negative and their magnitude increases as the distance between a query and a key grows.

Method A (Linear): The penalty's magnitude increases at a constant rate with distance (e.g., a penalty of -1 for distance 1, -2 for distance 2, -10 for distance 10).
Method B (Sub-linear): The penalty's magnitude increases sharply for short distances but then grows much more slowly for longer distances (e.g., using a logarithmic function).

Analyze the potential difference in a model's attention behavior when using Method A versus Method B, particularly regarding how it handles short-range versus long-range dependencies.

0

1

Updated 2025-10-04

Contributors are:

Who are from:

Learn Before

Related