Learn Before
Computational Expense of Training LLMs from Scratch
A major drawback of incorporating instruction-following data during pre-training is the immense computational cost associated with building and training Large Language Models from the ground up.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Enabling Zero-Shot Learning through Instruction Understanding
Computational Expense of Training LLMs from Scratch
Difficulty in Collecting Labeled Data for Instruction Pre-training
A research lab develops a new large language model by training it on a massive dataset consisting solely of digitized books and encyclopedias. The model becomes exceptionally proficient at generating coherent, factual paragraphs. However, when users give it a direct command, such as "Translate 'hello' into French," the model often responds with a continuation like "is a common English greeting," instead of "Bonjour."
Which of the following best analyzes the most likely reason for this specific failure?
Pre-training Data Strategy for a Command-Following Model
Pre-training a Specialized Code Assistant
Learn After
Strategic Decision for a New Language Model
A small, budget-conscious startup aims to create a novel instruction-following language model. Their strategy involves integrating specialized instruction-response pairs directly into the pre-training phase. What is the most significant practical challenge this startup will likely encounter when attempting to train their new model entirely from scratch?
A well-funded academic research lab proposes to create a new, state-of-the-art, instruction-following language model. Their plan is to train the model entirely from scratch on a massive dataset of general text combined with specialized instruction-response pairs. This approach is considered a practical and cost-effective strategy for such an organization.