1Cademy - A language model is being trained with a supervised objective to maximize the probability of the correct output. Given the input The largest city in the US is, the target output is the two-token sequence New York. Two different models are evaluated on this single instance.<br><br>* **Model A** predicts the first token New with a probability of 0.6, and then predicts the second token York with a probability of 0.8.<br>* **Model B** predicts the first token New with a probability of 0.9, and

Learn Before

Maximum Likelihood Estimation (MLE) Objective in Supervised Language Model Training

Multiple Choice

A language model is being trained with a supervised objective to maximize the probability of the correct output. Given the input 'The largest city in the US is', the target output is the two-token sequence 'New York'. Two different models are evaluated on this single instance.

Model A predicts the first token 'New' with a probability of 0.6, and then predicts the second token 'York' with a probability of 0.8.
Model B predicts the first token 'New' with a probability of 0.9, and

Updated 2025-10-02

Contributors are:

Who are from:

Learn Before

Related