Concept

Tokenizing

Tokenizing means to split up text by word or by sentence. This process provides the first step for data analysis. To use tokenizing features, import:

from nltk.tokenize import sent_tokenize, word_tokenize

sent_tokenize will split up the text by sentence and word_tokenize will split up the text by word. Both functions return a list of strings.

0

1

Updated 2021-08-03

References


Tags

Python Programming Language

Data Science