Concept

Training Unknown Word Model

Ways to train the probabilities of the unknown word model:

  • Choose a vocabulary (word list) that is fixed in advance; convert in the training set any OOV word to the unknown word token // in a text normalization step; and estimate the probabilities for // from its counts just like any other regular word in the training set.
  • Create a vocabulary implicitly by replacing words in the training data by // based on their frequency; and estimate the probabilities for // like before.

0

1

Updated 2022-06-28

Tags

Data Science