Short Answer

Function of Self-Attention in Auto-regressive Generation

A language model is built using a stack of modified Transformer decoder blocks. In these blocks, the sub-layer responsible for attending to a separate input sequence has been removed, leaving only the self-attention and feed-forward network sub-layers. Explain the specific role of the self-attention mechanism in enabling this model to perform its primary function: generating a new token based solely on the sequence of tokens that came before it.

0

1

Updated 2025-10-07

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.2 Generative Models - Foundations of Large Language Models

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science