1Cademy - Chunking in NLTK

Learn Before

Main features of NLTK

Concept

Chunking in NLTK

Chunking allows you to identify phrases by making use of POS tags. Naturally, to chunk a text, word_tokenize must be imported.

from nltk.tokenize import word_tokenize

Let text be the body of text to chunk.

tokenized_text = word_tokenize(text) # tokenized_text will be a list of words separated into different strings
nltk.download("averaged_perceptron_tagger") 
POS_tags = nltk.pos_tag(tokenized_text) # returns list of tuples with each word paired with a POS

The next step is to form a grammar rule by which the sentence should be phrased, or "chunked."

grammar = "NP: {<DT>?<JJ>*<NN>}"

This rule defines a Noun Phrase(NP), which means it can start with an optional determiner, then have any number of adjectives, then ends with a noun.

Then create a chunk parser with this grammar

chunk_parser = nltk.RegexpParser(grammar)
tree = chunk_parser.parse(POS_tags)
tree.draw()

Updated 2022-11-03

Contributors are:

Who are from:

University of Michigan - Ann Arbor

🏆 4

References

NLP with NLTK

Learn Before

Related