1Cademy - Distinguishing Words from Tokens

Method A: [&#x27;The&#x27;, &#x27;model&#x27;, &#x27;&#x27;s&#x27;, &#x27;performance&#x27;, &#x27;is&#x27;, &#x27;n&#x27;t&#x27;, &#x27;great&#x27;, &#x27;.&#x27;]
Method B: [&#x27;The&#x27;, &#x27;model&#x27;s&#x27;, &#x27;performance&#x27;, &#x27;isn&#x27;t&#x27;, &#x27;great&#x27;, &#x27;.&#x27;]

Learn Before

Tokens and Words in NLP

Short Answer

Distinguishing Words from Tokens

Consider the sentence: The cat's toy isn't here. First, count the number of words in the sentence. Then, determine how many tokens would be generated if a tokenizer follows these two rules:

It separates punctuation from words (e.g., here. becomes here and .).
It splits common contractions and possessives (e.g., cat's becomes cat and 's; isn't becomes is and n't).