Learn Before
Definition of c_gold
The variable is used to denote the correct, or ground-truth, label for a given training sample.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Example of a T5 Machine Translation Training Sample with Special Tokens
Example of a T5 Question-Answering Sample
Example of a T5 Simplification Task Sample
Differentiating Encoder and Decoder Sequences with Start Symbols
Versatility of the T5 Text-to-Text Format
Definition of c_gold
Formula for Input Embedding Composition
A researcher wants to train a model to perform a new task: converting a sentence from passive voice to active voice. Given the passive input sentence 'The cake was eaten by the dog' and the desired active output 'The dog ate the cake', which of the following training samples is correctly structured according to the unified, prefix-based text-to-text format?
Critiquing a Text-to-Text Training Sample
A single text-to-text model is being trained on a dataset containing samples for four different tasks. Each sample's input begins with a prefix that instructs the model on what to do. Match each input sample (Source Text) with the most likely task it is intended for.
Learn After
A text-to-text model is being trained on the following data sample formatted as 'input ā output':
summarize: The solar system consists of the Sun and the astronomical objects gravitationally bound to it. Of the eight planets, the four inner terrestrial planets are Mercury, Venus, Earth, and Mars, and the four outer giant planets are Jupiter, Saturn, Uranus, and Neptune. ā The solar system has eight planets, divided into inner terrestrial and outer giant groups.Which part of this sample represents the correct, or ground-truth, label that the model is expected to learn to produce?
Analyzing Training Data Quality
Impact of Incorrect Ground-Truth Labels