1Cademy - Input Formatting with Separator Tokens

Learn Before

General Formulation of a Sequence Model

Concept

Input Formatting with Separator Tokens

For tasks involving multiple text segments, a common input formatting strategy is to concatenate the sequences and use a special separator token, like [SEP], to distinguish between them. For instance, an input sequence $x_m$ and a target sequence $y_1, y_2, \dots, y_n$ can be combined into a single input stream for a model as: $x_m$ [SEP] $y_1 y_2 \dots y_n$ [SEP].