Multiple Choice

A model is being trained for a text labeling task where the goal is to maximize the probability assigned to the correct label for each word. The training loss is calculated as the average of the negative logarithm of these probabilities. Consider the model's performance on one sentence, evaluated by two different sets of parameters (Model A and Model B). The table below shows the probability each model assigned to the correct label for each of the seven words in the sentence.

WordModel A ProbabilityModel B Probability
Word 10.90.8
Word 20.80.6
Word 30.70.6
Word 40.90.8
Word 50.90.8
Word 60.10.7
Word 70.90.8

Based on this data, which model would have a lower training loss for this specific sentence, and why?

0

1

Updated 2025-10-04

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science