Learn Before
Short Answer

Comparing Positional Bias Functions

Consider two different methods for applying a positional penalty to the attention scores in a transformer model. Both penalties are negative and their magnitude increases as the distance between a query and a key grows.

  • Method A (Linear): The penalty's magnitude increases at a constant rate with distance (e.g., a penalty of -1 for distance 1, -2 for distance 2, -10 for distance 10).
  • Method B (Sub-linear): The penalty's magnitude increases sharply for short distances but then grows much more slowly for longer distances (e.g., using a logarithmic function).

Analyze the potential difference in a model's attention behavior when using Method A versus Method B, particularly regarding how it handles short-range versus long-range dependencies.

0

1

Updated 2025-10-04

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Computing Sciences

Foundations of Large Language Models Course

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science