Learn Before
Case Study

Model Selection for a High-Traffic Application

A company is launching a new AI-powered code completion tool for software developers. They anticipate a very high volume of simultaneous users. They are testing two different language models, Model X and Model Y, which have been judged to have nearly identical accuracy and quality for the task. Performance testing yields the following data:

  • Model X: Can process 400 tokens per second.
  • Model Y: Can process 1,600 tokens per second.

Given that the primary business requirement is to serve a large, active user base without system slowdowns, which model is the more suitable choice? Justify your decision by explaining how the relevant performance metric informs your selection.

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science