LLM Architecture Selection for a Legal Tech Application
A legal technology firm is developing a tool to summarize and analyze legal contracts, which often exceed 50,000 tokens in length. They are considering two pre-trained language models:
- Model A: A standard model known for its state-of-the-art performance on general language tasks. Its internal mechanisms have computational and memory requirements that grow quadratically as the input length increases.
- Model B: A newer model with a modified internal structure designed to process long inputs more efficiently. Its memory usage scales more favorably with input length, but this modification leads to a minor, measurable decrease in performance on standard benchmark tests compared to Model A.
Evaluate the trade-offs between these two models for the firm's specific application. Which model would you recommend, and why? Justify your decision by explaining the underlying architectural challenge that Model B is designed to solve.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
LLM Architecture Selection for a Legal Tech Application
A development team is building a language model based on the standard Transformer architecture to summarize lengthy legal documents, often exceeding 10,000 tokens. They observe that the model's memory usage grows quadratically with the input length, leading to out-of-memory errors. Which of the following architectural modifications most directly targets the root cause of this specific memory issue?
Diagnosing LLM Performance Bottlenecks