Multiple Choice

A developer is implementing an attention layer for a model that incorporates positional information by adding a learnable scalar bias based on the relative distance between tokens. Given a query vector q_i for a token at position i, a key vector k_j for a token at position j, a key dimension d_k, and the specific learnable bias u_{b(i-j)} for their relative position, which of the following expressions correctly computes the unnormalized attention score (the value passed into the softmax function) for this architectural design?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science