Case Study

Language Model Design Trade-offs

A company is developing a real-time chatbot for customer service. They are considering two different architectures for generating the next token in a response.

  • Architecture 1: A highly complex, sequential model that achieves maximum possible accuracy for predicting the next token by performing an exhaustive analysis of the conversation history for every single token it generates. This process is very slow.
  • Architecture 2: A less complex model that can be computed in a highly parallel way, making it extremely fast. This speed comes at the cost of a small reduction in next-token prediction accuracy compared to Architecture 1.

Analyze the trade-off between these two architectures. Which one represents a more viable approach for this specific application, and why? Your explanation should connect the choice to the two fundamental, interconnected tasks involved in implementing autoregressive models.

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science