Learn Before
Language Model Generalization
A company is testing two different language modeling systems for a customer service chatbot.
-
System A was trained on a large corpus of text. When it encounters a phrase like 'purchase a ticket', it performs well. However, when a user types 'acquire a ticket', the system fails, as the exact word 'acquire' was rarely seen in the context of 'ticket' in its training data.
-
System B was trained on the same data. When it encounters 'acquire a ticket', it correctly understands the user's intent. The system's internal data shows that it represents the words 'purchase' and 'acquire' in a mathematically similar way.
Based on this information, analyze the fundamental difference in how these two systems represent words. Explain why this difference allows System B to generalize to the unseen phrase while System A cannot.
0
1
Tags
Data Science
Deep Learning (in Machine learning)
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Large Language Models (LLMs)
BERT (Bidirectional Encoder Representations from Transformers)
Bengio et al. (2003) Feed-Forward Neural Language Model
A team is developing a language model to predict the next word in a sentence. They find that their model assigns a probability of zero to the phrase 'the innovative chef prepares...' because it has never seen the specific two-word sequence 'innovative chef' in its training data, despite having seen 'innovative ideas' and 'master chef' many times. Which characteristic of a neural network-based approach to language modeling is specifically designed to overcome this type of generalization failure?
NLM Advantage Over Traditional Models
Language Model Generalization