Learn Before
Concept
Training Unknown Word Model
Ways to train the probabilities of the unknown word model:
- Choose a vocabulary (word list) that is fixed in advance; convert in the training set any OOV word to the unknown word token /
/ in a text normalization step; and estimate the probabilities for / / from its counts just like any other regular word in the training set. - Create a vocabulary implicitly by replacing words in the training data by /
/ based on their frequency; and estimate the probabilities for / / like before.
0
1
Updated 2022-06-28
Tags
Data Science