Case Study

Evaluating Model Training Progress

A machine learning engineer is pre-training a language model on a dataset. They evaluate the model's performance at two different stages of training (after 100 epochs and after 200 epochs) using a representative sample of three sequences from the dataset. The loss for each sequence is recorded in the table below.

Sequence IDLoss at 100 EpochsLoss at 200 Epochs
Sequence A5.24.1
Sequence B6.83.5
Sequence C4.54.2

Based on the fundamental objective of the pre-training process, which version of the model (at 100 epochs or 200 epochs) is performing better? Justify your choice by referencing the data and the overall goal of training.

0

1

Updated 2025-10-02

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science