1Cademy - Using [SEP] Tokens for Sequence Concatenation

Learn Before

Notation for Concatenated Token Sequences

Example

Using [SEP] Tokens for Sequence Concatenation

A common method for preparing data for sequence models involves concatenating input and output sequences into a single string, using a special separator token like [SEP] to distinguish between them. For instance, an input sequence xm and an output sequence y1 y2 ... yn can be formatted as xm [SEP] y1 y2 ... yn [SEP]. This structure allows the model to process both sequences simultaneously while understanding their distinct roles.