1Cademy - An engineering team is deploying a large language model for a real-time chatbot application on a device with limited processing power but ample available memory. They are considering two approaches for generating responses: * **Approach A:** For each new word generated, the model re-processes the entire conversation history from scratch. * **Approach B:** The model stores key intermediate calculations from previous words in memory and reuses them to generate the next word. Which of the following statements best analyzes the trade-offs between these two approaches in the context of the teams hardware constraints?

Learn Before

Memory-Compute Trade-off in LLM Inference

Multiple Choice

An engineering team is deploying a large language model for a real-time chatbot application on a device with limited processing power but ample available memory. They are considering two approaches for generating responses:

Approach A: For each new word generated, the model re-processes the entire conversation history from scratch.
Approach B: The model stores key intermediate calculations from previous words in memory and reuses them to generate the next word.

Which of the following statements best analyzes the trade-offs between these two approaches in the context of the team's hardware constraints?

0

1

Updated 2025-10-02

Contributors are:

Who are from:

Learn Before

Related