1Cademy - In a model that adds a linear positional bias to its attention calculation, a query at position `i=10` attends to two keys at positions `j1=5` and `j2=2`. Assuming the scaled dot-product portion of the score is identical for both keys, how will the addition of the positional bias term `PE(i, j)` affect their final pre-Softmax attention scores?

Learn Before

Formula for Attention Score with ALiBi Bias

Multiple Choice

In a model that adds a linear positional bias to its attention calculation, a query at position i=10 attends to two keys at positions j1=5 and j2=2. Assuming the scaled dot-product portion of the score is identical for both keys, how will the addition of the positional bias term PE(i, j) affect their final pre-Softmax attention scores?

Updated 2025-10-04

Contributors are:

Who are from:

Learn Before

Related