Learn Before
Evaluating Tokenization for a Specialized Chatbot
Based on the case study, evaluate the company's choice of a simple word-based tokenization strategy. Explain the primary reason for the chatbot's poor performance and recommend a more suitable tokenization approach, justifying why it would be more effective in this specific context.
0
1
Tags
Data Science
Foundations of Large Language Models Course
Computing Sciences
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Different standards for tokenization
Inference Process with a Fine-Tuned Model
Example of Tokenization into Words and Punctuation
Example of Word and Punctuation Tokenization
Methods of Tokenization
A language model is given the sentence: 'The researcher is studying neuroplasticity.' It processes the sentence using two different methods, resulting in two different sequences of tokens.
Method A:
['The', 'researcher', 'is', 'studying', 'neuroplasticity', '.']Method B:['The', 'researcher', 'is', 'study', 'ing', 'neuro', 'plasticity', '.']Assuming the model has never encountered the word 'neuroplasticity' during its training but has seen words like 'neuroscience' and 'plasticity' separately, which method is more advantageous for helping the model understand the new word, and why?
Tokenization Strategies
Evaluating Tokenization for a Specialized Chatbot