Example of Code-Switching between Chinese and English
A mixed-language sentence that blends Chinese and English, such as '周末 我们 打算 去 做 hiking , 你 想 一起 来 吗 ?' (We plan to go hiking this weekend, would you like to join us?), is a prominent instance of code-switching. When processed by multi-lingual pre-trained models, there is no need to identify whether a token is Chinese or English. Instead, every token is simply treated as an entry in a shared vocabulary, effectively treating the mixed input as a single composite language.

0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Models of Code Switching
Why Speakers Code-Switch
Example of Code-Switching between Chinese and English
Benefit of Multilingual Pre-trained Models: Handling Code-Switching
A user is sending text messages that mix two different languages. Which of the following messages best exemplifies the practice of alternating between languages within a single, coherent thought or sentence?
Diagnosing NLP Model Failure
Defining and Illustrating Code-Switching
Language-Independent Token Representations
A multilingual model is pre-trained on a large corpus of English and Spanish text using a single, unified vocabulary. The model processes the word 'pie', which means 'foot' in Spanish and refers to a baked dish in English. How will this word most likely be represented within the model's vocabulary structure?
Trade-offs of a Unified Vocabulary in Multilingual Models
In a multilingual model pre-trained on English and German, the shared vocabulary is structured into two distinct sections, one for English tokens and one for German tokens, to prevent interference between the languages.
Language-Independent Token Representations
Example of Code-Switching between Chinese and English
Learn After
A bilingual speaker is sending a text message. Which of the following messages best demonstrates the practice of embedding a word from one language into the grammatical structure of another language within a single, continuous sentence?
Identifying Code-Switching in a Conversation
Constructing a Mixed-Language Sentence