Learn Before
A research team is preparing a massive, diverse dataset scraped from the web to train a large language model. They are primarily concerned with two potential issues: training instability and the model learning undesirable social biases from the raw data. Which of the following data preparation strategies would most directly and effectively address both of these concerns?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Data Quality as a Key Issue in LLM Training
Analyzing a Data Preparation Pipeline
A team is preparing a massive text dataset for training a new large language model. Arrange the following key data preparation stages into the most logical and efficient sequence.
A research team is preparing a massive, diverse dataset scraped from the web to train a large language model. They are primarily concerned with two potential issues: training instability and the model learning undesirable social biases from the raw data. Which of the following data preparation strategies would most directly and effectively address both of these concerns?