Based on the scenario provided, which dataset (V1 or V2) is the more suitable choice for training a model that is highly sensitive to the precise position of words in a sentence? Justify your reasoning.

Google

The distinction between token masking and token deletion can be demonstrated using an original sequence, denoted as $$\mathbf{x}$$: `The puppies are frolicking outside the house .`. When applying **Token Masking** to create a noisy sequence $$\mathbf{x}_{\mathrm{noise}}$$, selected tokens such as 'frolicking' and 'the' are replaced with a special symbol, resulting in: `The puppies are [MASK] outside [MASK] house .`. Conversely, applying **Token Deletion** to form $$\mathbf{x}_{\mathrm{noise}}$$ completely removes the selected tokens from the sequence (e.g., ~~frolicking~~ and ~~the~~), yielding: `The puppies are outside house .`.

Example Comparison of Token Masking and Token Deletion

An engineer is preparing text data for a model. The process involves taking an original sentence and creating a 'damaged' version by altering some of its words. The engineer observes that the damaged sentences are consistently shorter in length (fewer total words) than their original counterparts. Which of the following data alteration methods is the engineer most likely using?

Choosing a Data Corruption Strategy

An original sentence was altered using two different methods to create the two corrupted versions shown below.

**Original:** 'The quick brown fox jumps over the lazy dog.'
**Version A:** 'The quick [MASK] fox [MASK] over the lazy dog.'
**Version B:** 'The quick fox over the lazy dog.'

For each version (A and B), identify the specific data corruption method used and briefly explain the primary evidence in the text that supports your identification.

Learn Before

Related