Learn Before
Visual Diagram of Data Selection with a Small Model
This diagram illustrates a workflow where a small model is used to select data for training a larger model. The process begins with a small model performing 'Data Selection' on an initial dataset, resulting in a curated dataset of input-label pairs (x, y). Subsequently, the large model is trained on this curated dataset. The training loop involves feeding an input 'x' to the large model, comparing its output with the corresponding label 'y', and then using the computed loss to update the model's parameters.

0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Computing Sciences
Foundations of Large Language Models Course
Related
Ensemble of Small Models for Data Selection
A research team is fine-tuning a very large, computationally expensive language model on a massive, noisy dataset. To optimize their limited budget, they first perform a single pass with the large model over the dataset to calculate the training loss for each data sample. They then train a much smaller, faster model to predict the loss values that the large model assigned. Finally, they use this trained small model to filter the dataset, keeping only the samples predicted to have high loss. Which statement best evaluates the effectiveness of this data selection strategy?
Visual Diagram of Data Selection with a Small Model
You are tasked with curating a high-quality dataset for fine-tuning a large, computationally expensive model from a massive, unfiltered data source. You decide to use a smaller, auxiliary model to help with the selection process. Arrange the following steps into the correct logical sequence for this data selection workflow.
Optimizing Training with a Limited Budget
Learn After
Consider a training process where a small model first selects a subset of data from a large, initial dataset. A much larger model is then trained exclusively on this selected subset. If the large model trains without errors but ultimately performs poorly on its intended task, which of the following is the most likely reason for the failure, based on the logic of this specific workflow?
A machine learning pipeline uses a small model to select high-quality data for training a larger model. Arrange the following steps of this process into the correct chronological order.
Troubleshooting a Two-Stage Training Pipeline