Multiple Choice

Consider the calculation of an attention weight, which determines the influence of an input at position j on the output at a later position i. The calculation is based on a formula that includes: 1) a similarity score between vectors from positions i and j, 2) a term that depends on the relative distance between i and j, and 3) a masking component that prevents attending to positions k where k > i. If the term that depends on the relative distance were removed from this calculation, what would be the primary consequence?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science