Concept

Shared Vocabulary in Multilingual Models

In multilingual pre-trained models, tokens from different languages are not explicitly identified by their source language. Instead, they are all treated as entries within a single, unified vocabulary. This approach effectively creates a composite 'language' that includes the vocabularies of all processed languages, allowing the model to handle multilingual text seamlessly.

0

1

Updated 2026-04-18

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences