Learn Before
Concept
Chunking in NLTK
Chunking allows you to identify phrases by making use of POS tags. Naturally, to chunk a text, word_tokenize must be imported.
from nltk.tokenize import word_tokenize
Let text be the body of text to chunk.
tokenized_text = word_tokenize(text) # tokenized_text will be a list of words separated into different strings nltk.download("averaged_perceptron_tagger") POS_tags = nltk.pos_tag(tokenized_text) # returns list of tuples with each word paired with a POS
The next step is to form a grammar rule by which the sentence should be phrased, or "chunked."
grammar = "NP: {<DT>?<JJ>*<NN>}"
This rule defines a Noun Phrase(NP), which means it can start with an optional determiner, then have any number of adjectives, then ends with a noun.
Then create a chunk parser with this grammar
chunk_parser = nltk.RegexpParser(grammar) tree = chunk_parser.parse(POS_tags) tree.draw()
0
1
Updated 2022-11-03
Contributors are:
Who are from:
Tags
Python Programming Language
Data Science