Multiple Choice

In a language model using the complete ALiBi attention formula for causal text generation, the model needs to prevent a query token at position i from attending to any key token at a future position j (where j > i). How does the Mask(i, j) term within the formula α(i, j) = Softmax((q_iᵀk_j + β⋅(j-i))/√d + Mask(i, j)) achieve this?

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science