Features in a CRF POS Tagger
In a Conditional Random Field (CRF) Part-of-Speech (POS) tagger, specific features are automatically populated using feature templates. For example, templates that use information from the previous tag , current tag , input sequence , and position include: , , and . These templates automatically generate the set of features from every instance in the training and test sets. Word shape features represent the abstract letter pattern of a word by mapping lower-case letters to 'x', upper-case to 'X', numbers to 'd', and retaining punctuation; this helps in handling unknown words. Known-word templates are computed for every word seen in the training set, while unknown-word features can also be computed for all words in training or only those whose frequency is below a threshold. This results in a very large set of features. Generally, a feature cutoff is used where features are discarded if their count is in the training set. For CRF training and inference, there is always a fixed set of features with weights, even though the length of each sentence varies.

0
1
Contributors are:
Who are from:
Tags
Data Science