Multiple Choice

A machine learning engineer is training a model to reconstruct a document from a corrupted version. They are considering two different strategies for creating the corrupted input:

  • Strategy A: Replace 15% of the words in the document, chosen at random, each with a single [MASK] token.
  • Strategy B: Replace three separate, contiguous spans of words (which together make up 15% of the document's total words) with a single [SPAN] token for each span.

Assuming all other factors are equal, which strategy is likely to result in a more computationally efficient training process, and why?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Computing Sciences

Foundations of Large Language Models Course

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science