1Cademy - A machine learning engineer is training a model to reconstruct a document from a corrupted version. They are considering two different strategies for creating the corrupted input: * **Strategy A:** Replace 15% of the words in the document, chosen at random, each with a single `[MASK]` token. * **Strategy B:** Replace three separate, contiguous spans of words (which together make up 15% of the documents total words) with a single `[SPAN]` token for each span. Assuming all other factors are equal, which strategy is likely to result in a more computationally efficient training process, and why?

Learn Before

Training Efficiency in Denoising Autoencoding

Multiple Choice

A machine learning engineer is training a model to reconstruct a document from a corrupted version. They are considering two different strategies for creating the corrupted input:

Strategy A: Replace 15% of the words in the document, chosen at random, each with a single [MASK] token.
Strategy B: Replace three separate, contiguous spans of words (which together make up 15% of the document's total words) with a single [SPAN] token for each span.

Assuming all other factors are equal, which strategy is likely to result in a more computationally efficient training process, and why?

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related