1Cademy - Example of Tokenization into Words and Punctuation

Learn Before

Tokenization

Example

Example of Tokenization into Words and Punctuation

A simple and straightforward approach to tokenization is to segment a text into individual English words and punctuation marks. For instance, given the text "I love the food here. It's amazing!", it can be broken down into the following sequence of tokens: $\left\{ \textrm{I}, \textrm{love}, \textrm{the}, \textrm{food}, \textrm{here}, \textrm{.}, \textrm{It}, \textrm{'s}, \textrm{amazing}, \textrm{!} \right\}$ .

Updated 2026-04-14

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn After

A piece of text is segmented into a sequence of smaller units by separating it into individual words and treating each punctuation mark as its own distinct unit. Given this method, which of the following options correctly represents the segmentation of the sentence: "She said, 'It's great!'"?
Applying Word and Punctuation Segmentation
Analyzing a Tokenization Function's Output

Learn Before

Related

Learn After