Multiple Choice

In a text-processing model, a bias term is added to the attention scores between a 'query' word and a 'key' word before the final attention weights are computed. This bias is calculated as -β * d, where d is the distance (number of words) between the query and the key, and β is a fixed positive number. What is the primary effect of this biasing scheme on the model's behavior?

0

1

Updated 2025-10-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science