Learn Before
Pre-training Objective Selection
A research team is developing a language model for a complex task that requires both generating coherent, long-form text and accurately filling in missing information within existing drafts. They are considering three pre-training objectives:
- An objective that predicts the next word in a sequence based only on the words that came before it.
- An objective that predicts randomly masked words in a sentence by looking at all the other visible words, both before and after the mask.
- An objective that shuffles the order of words in a sentence and then predicts them one by one in that new shuffled order.
Evaluate which of these three objectives is most suitable for the team's dual requirements. Justify your choice by explaining its advantages and the primary limitations of the other two for this specific scenario.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Probability Factorization for Arbitrary Order Token Prediction
A language model is pre-trained using an objective where, for the input sentence 'The model learns from text', it might be tasked to predict the word 'learns' based on the context of 'text' and 'The', while the word 'model' is not yet visible to it. In the next step, it might predict 'model' based on 'text', 'The', and the newly predicted 'learns'. What is the primary advantage of this training approach compared to a standard left-to-right sequential prediction?
A language model is being pre-trained on the sentence 'The quick brown fox jumps' using a permuted objective. The model is given a random permutation of the token positions: (3, 5, 1, 4, 2). Arrange the words from the sentence in the order they will be auto-regressively predicted during this training step.
Pre-training Objective Selection
Comparison of Permuted and Causal Language Modeling
Implementing Permutation via Self-Attention Masks