Learn Before
Analysis of a Self-Supervised Training Strategy
A team is training a model to perform a complex generation task. Their training process for each input involves two steps:
- First, the model is used to determine the single most probable output sequence, which we'll call the 'optimal output'.
- Second, the model's parameters are adjusted to maximize the probability of producing that same 'optimal output', but this time, the model is given a slightly modified and less complete version of the original input.
Based on this two-step process, what is the primary capability the model is being trained to develop, and why is this approach potentially more powerful than simply training the model on a fixed set of pre-written input-output pairs?
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Computing Sciences
Foundations of Large Language Models Course
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Analysis of a Self-Supervised Training Strategy
A model is trained using a two-stage process. In the first stage, given an input context
c, the model identifies an optimal output sequence,ŷ. In the second stage, the model's parameters are updated to maximize the probability of generating that same sequenceŷ, but this time conditioned on a slightly modified version of the original context,c'. What is the primary reason for using the modified contextc'in the second stage instead of the original contextc?Consider a training process where the objective function is defined as
Loss = log Pr(ŷ | c', z), withŷbeing an optimal prediction generated by the model itself. During training, the model's parameters are updated with the goal of minimizing this specificLossvalue.