1Cademy - Evaluating a Modified Pre-training Strategy

Learn Before

Training Objective of the Standard BERT Model

Case Study

Evaluating a Modified Pre-training Strategy

Based on the principles of the standard dual-task training objective, what specific type of understanding would this new model likely lack compared to a model trained with both objectives? Explain your reasoning.

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Data Science

Foundations of Large Language Models Course

Computing Sciences

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

BERT Loss Function
Concurrent Loss Calculation for MLM and NSP
A researcher is pre-training a large language model using a dual-task objective. The model is simultaneously trained on two tasks:
1. Predicting randomly obscured words within a given text.
2. Determining if two text segments presented together originally appeared consecutively. The final training update is based on the model's combined performance on both tasks. Which of the following statements best analyzes the primary advantage of this specific dual-task approach?
Evaluating a Modified Pre-training Strategy
The original pre-training process for the Bidirectional Encoder Representations from Transformers model involves a dual-task objective where the total loss is the sum of the losses from two distinct tasks. Match each training task to its corresponding description.

Learn Before

Related