Model Suitability for a Generation Task
A research team is developing a system to generate short, creative story paragraphs from scratch, with no initial text provided other than a signal to start generating. They have access to a powerful pre-trained model that was exclusively trained using an objective where 100% of the input text tokens were consistently replaced with a special [MASK] token, and the model's goal was to reconstruct the original text. Based on its training method, evaluate the suitability of this pre-trained model for the team's creative story generation task. Justify your reasoning by explaining the core capability the model likely developed during its training.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Consider a text-infilling model that is typically trained by masking about 15% of the words in a sentence and having the model predict them based on the surrounding unmasked words. If this training process is modified to mask 100% of the words in every input sentence, what is the most significant change in the fundamental skill the model is being trained to perform?
Model Suitability for a Generation Task
Shift in Training Objective with 100% Masking