1Cademy - A self-attention mechanism is modified to include a bias term that systematically penalizes attention scores between pairs of tokens. The magnitude of this penalty increases as the distance between the tokens positions in the sequence grows. For which of the following tasks would this modification be most likely to *hinder* the models performance?

Learn Before

Kerple

Multiple Choice

A self-attention mechanism is modified to include a bias term that systematically penalizes attention scores between pairs of tokens. The magnitude of this penalty increases as the distance between the tokens' positions in the sequence grows. For which of the following tasks would this modification be most likely to hinder the model's performance?

Updated 2025-10-06

Contributors are:

Who are from:

Learn Before

Related