Learn Before
When a large language model first processes a user's prompt, it can perform calculations for all words in the prompt simultaneously rather than one by one. What is the fundamental condition that makes this highly parallel approach possible during this initial stage?
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Self-Attention Formula for the Prefilling Phase
Prefilling as a Compute-Bound Process
Token Prediction within the Prefilling Phase
When a large language model first processes a user's prompt, it can perform calculations for all words in the prompt simultaneously rather than one by one. What is the fundamental condition that makes this highly parallel approach possible during this initial stage?
LLM Inference Performance Analysis
Rationale for Parallelism in Initial Prompt Processing
Diagram of the Prefilling Phase