1Cademy - A language model is being trained to predict the next word in a sequence. For a given training example, the function $\mathcal{L}(\mathbf{p}^{\theta}, \mathbf{p}^{\text{gold}})$ is used to measure the difference between the models predicted probability distribution over the vocabulary ($\mathbf{p}^{\theta}$) and the true distribution ($\mathbf{p}^{\text{gold}}$), where the true next word has a probability of 1. If the calculated value of $\mathcal{L}$ is very high for this example, what does th

Learn Before

Loss Function for Predicted vs. Gold Probability Distributions

Multiple Choice

A language model is being trained to predict the next word in a sequence. For a given training example, the function $\mathcal{L}(\mathbf{p}^{\theta}, \mathbf{p}^{\text{gold}})$ is used to measure the difference between the model's predicted probability distribution over the vocabulary ( $\mathbf{p}^{\theta}$ ) and the true distribution ( $\mathbf{p}^{\text{gold}}$ ), where the true next word has a probability of 1. If the calculated value of $\mathcal{L}$ is very high for this example, what does th

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related