LLaMA2
LLaMA2 is a family of large language models introduced in 2023. It includes prominent versions such as the 65-billion-parameter LLaMA2-65B, which was pre-trained on between 1.0 trillion and 1.4 trillion tokens. The training data for LLaMA2 comes from a diverse mix of public sources, including webpages, software code, Wikipedia, books, academic papers, and question-and-answer content.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Computing Sciences
Foundations of Large Language Models Course
Related
BERT
BART
T5
BERT (Bidirectional Encoder Representations from Transformers)
RoBERTa
GPT Series
LLaMA2
DeepSeek-V3
Falcon
Mistral
PaLM-450B
Gemma-7B
Gemma2
A software development team is tasked with building a feature that can automatically generate a concise, one-paragraph summary from a long news article. The system needs to first comprehend the full context of the source article and then generate a new, coherent summary. Based on the typical strengths of different foundational model designs, which of the following models would be the most suitable choice for this specific task?
Match each pre-trained model with the description that best fits its architectural design and primary use case.
Evaluating Model Architecture Selection for a Classification Task
Data Volume vs. Quality in LLM Pre-training
GPT-3
Falcon
LLaMA2
PaLM-450B
Gemma-7B
Evaluating Data Sources for LLM Pre-training
Data Source Selection for a Specialized LLM
A newly developed large language model demonstrates high fluency and generates grammatically perfect, conversational text. However, it frequently provides outdated information, struggles to generate well-structured, long-form content like reports, and often fabricates details when asked about events from the last year. Based on these specific performance characteristics, which of the following descriptions most likely represents the composition of its pre-training dataset?
GPT-3
Falcon
LLaMA2
PaLM-450B
Gemma-7B
Learn After
A research team is evaluating foundational models for two distinct projects. Project A requires a model to perform complex text classification and sentiment analysis on legal documents. Project B requires a model to generate creative, long-form stories from a short prompt. Based on the typical design of large-scale, generative language models, which statement best analyzes the suitability of a model like the 65-billion-parameter LLaMA2 for these projects?
Analyzing Model Behavior Based on Pre-training Data
Evaluating the Impact of LLaMA2's Pre-training Data