Learn Before
Multiple Choice

A self-attention mechanism is modified to include a bias term that systematically penalizes attention scores between pairs of tokens. The magnitude of this penalty increases as the distance between the tokens' positions in the sequence grows. For which of the following tasks would this modification be most likely to hinder the model's performance?

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Computing Sciences

Foundations of Large Language Models Course

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science