Multiple Choice

An engineering team is deploying a large language model for a real-time chatbot application on a device with limited processing power but ample available memory. They are considering two approaches for generating responses:

  • Approach A: For each new word generated, the model re-processes the entire conversation history from scratch.
  • Approach B: The model stores key intermediate calculations from previous words in memory and reuses them to generate the next word.

Which of the following statements best analyzes the trade-offs between these two approaches in the context of the team's hardware constraints?

0

1

Updated 2025-10-02

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science