1Cademy - Example of a T5 Machine Translation Training Sample with Special Tokens

Learn Before

T5 Sample Format
Differentiating Encoder and Decoder Sequences with Start Symbols

Example

Example of a T5 Machine Translation Training Sample with Special Tokens

A training sample for a machine translation task, such as Chinese to English, illustrates the T5 format. The sample includes a task-specific prefix, the input text, and the target translation. Special tokens structure the data for the model, for example: [CLS] Translate from Chinese to English: 你好！ → ⟨s⟩ Hello!. In this format, [CLS] serves as the start symbol for the source text (encoder input), while <s> is the start symbol for the target text (decoder input).