1Cademy - Example of Word and Punctuation Tokenization

Learn Before

Tokenization

Example

Example of Word and Punctuation Tokenization

A fundamental method of tokenization involves segmenting a text into its constituent English words and punctuation marks. For example, the phrase 'I love the food here. It’s amazing' would be tokenized into the following sequence of units: {I, love, the, food, here, ., It, ’s, amazing}.

Updated 2025-10-10

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn After

A tokenization process is designed to segment text into individual English words and punctuation marks. For example, the phrase 'It’s great.' is tokenized into ['It', '’s', 'great', '.']. Based on this rule, how would the sentence 'The student's book isn't here.' be tokenized?
Applying Word and Punctuation Tokenization
Consider a tokenization method that segments text into individual English words and punctuation marks. For instance, 'It’s great.' becomes ['It', '’s', 'great', '.']. True or False: Following this method, the phrase 'We're going home.' would be tokenized as ['We', '’re', 'going', 'home.'].

Learn Before

Related

Learn After