Multiple Choice

An autoregressive model is processing a sequence of 4 tokens. To ensure that the prediction for any given token is based only on the tokens that came before it and the token itself, a specific structure is imposed on the attention weight matrix. Which of the following 4x4 matrices correctly illustrates this structure, where 'α' represents a calculated, non-zero attention weight and '0' represents a weight that has been forcibly set to zero?

0

1

Updated 2025-10-04

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science