Learn Before
Multiple Choice

In a self-attention mechanism processing a sequence of 4 tokens, a mask is added to the raw attention scores to prevent any token from attending to subsequent (future) tokens. Which of the following 4x4 matrices correctly represents this mask?

0

1

Updated 2025-10-07

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.5 Inference - Foundations of Large Language Models

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science