Learn Before
During the initial processing of an input sequence, the parallel computation phase is strictly limited to generating the probability distribution for only the single, first output token.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Computing Sciences
Foundations of Large Language Models Course
Comprehension in Revised Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
When a large language model is given a long text prompt, the initial processing phase computes representations for all input tokens simultaneously. Considering this highly parallel approach, what is the most direct and immediate outcome related to generating the first piece of the model's response?
Post-Prefilling State Analysis
During the initial processing of an input sequence, the parallel computation phase is strictly limited to generating the probability distribution for only the single, first output token.