1Cademy - Example of Target Output Format for Span-Based Denoising

Learn Before

Example of Representing Masked Spans with Sentinel Tokens

Example

Example of Target Output Format for Span-Based Denoising

In a span-based denoising task, the model is trained to generate a target sequence that pairs each sentinel token from the input with the original text it replaced. For an input like [CLS] The puppies are [X] outside [Y] ., the corresponding target output would be ⟨s⟩ [X] frolicking [Y] the house [Z]. This format explicitly maps the sentinel [X] to 'frolicking' and [Y] to 'the house', with [Z] often serving as an end-of-sequence marker.