Learn Before
Short Answer

Effect of Distance-Based Attention Penalty

A self-attention mechanism is modified to include a bias term. This term is calculated by taking the distance between a query token and a key token and multiplying it by a fixed negative number. Explain how this modification influences the attention scores for tokens that are close together compared to tokens that are far apart.

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science