Learn Before
Evaluating Model Training Progress
A machine learning engineer is pre-training a language model on a dataset. They evaluate the model's performance at two different stages of training (after 100 epochs and after 200 epochs) using a representative sample of three sequences from the dataset. The loss for each sequence is recorded in the table below.
| Sequence ID | Loss at 100 Epochs | Loss at 200 Epochs |
|---|---|---|
| Sequence A | 5.2 | 4.1 |
| Sequence B | 6.8 | 3.5 |
| Sequence C | 4.5 | 4.2 |
Based on the fundamental objective of the pre-training process, which version of the model (at 100 epochs or 200 epochs) is performing better? Justify your choice by referencing the data and the overall goal of training.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Probability Computation with Pre-trained Language Models
A language model is being trained on a large dataset of text. After an initial training iteration, the model's performance is measured on three distinct sequences from the dataset, yielding the following loss values:
- Sequence 1: Loss = 8.4
- Sequence 2: Loss = 2.1
- Sequence 3: Loss = 5.5
Based on the fundamental objective of this training process, which of the following statements most accurately describes the model's overall goal?
Evaluating Model Training Progress
From Single Sequence to Full Dataset
The primary objective of pre-training a language model on a dataset is to find a unique, optimal set of model parameters for each individual text sequence within that dataset.
Pre-training Objective Formula