Short Answer

Formulating the MLE Objective for a Small Dataset

A model is being trained on a small dataset consisting of just two sequences of tokens: Sequence A = (x₁, x₂, x₃) and Sequence B = (y₁, y₂). The training process aims to find model parameters that maximize the total log-probability of this dataset. Write out the specific mathematical expression that represents this total log-probability, decomposed into a sum of individual conditional log-probabilities.

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science