Short Answer

Incorporating Positional Bias into Attention Scores

The standard formula for a scaled dot-product attention score between a query vector q_i and a key vector k_j is (q_i ⋅ k_j) / sqrt(d). How would you modify this formula to include a relative positional bias term, PE(i, j), that is added directly to this score before the normalization step?

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science