1Cademy - Interpreting Contextual Representations

Learn Before

Learning Contextual Representations via Masked Token Prediction

Case Study

Interpreting Contextual Representations

Based on the training method described in the case study, explain the fundamental reason why the model generates distinct internal representations for the same word ('rock') in different sentences.

Updated 2025-09-28

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

Interpreting Contextual Representations
A language model is trained on the following sentence where one word is hidden: 'The scientist meticulously calibrated the [MASK] before the experiment.' The model's primary training objective is to predict the hidden word. Why is this task effective for teaching the model to understand the relationships between words?
A language model is trained using an objective where it must predict randomly hidden words in a sentence based on the surrounding words. After this training, the model will generate the exact same numerical representation for the word 'bank' in the sentences 'He sat on the river bank' and 'She withdrew money from the bank'.

Learn Before

Related