1Cademy - In the architectural flow for generating a single new token, a decoder-only model processes the input sequence through multiple layers. After the final decoder layer produces its output vector, what is the immediate and primary purpose of applying a final linear mapping and a Softmax function?

Learn Before

Diagram of the Autoregressive Generation Architectural Flow

Multiple Choice

In the architectural flow for generating a single new token, a decoder-only model processes the input sequence through multiple layers. After the final decoder layer produces its output vector, what is the immediate and primary purpose of applying a final linear mapping and a Softmax function?

Updated 2025-10-04

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences