Learn Before
Relation

Building an autocorrect model

  1. Identify a misspelled word
    • If word not in vocabulary: misspelled = True
  2. Find string n edit distance away
    • n edit distance tells you how many operations away one string is from another (usually set n to 1-3).
  3. Filter candidates
    • If candidate not in vocabulary: list.pop(candidate)
    • At last, you will get a list of actual words
  4. Calculate word probabilities
    • In the text/corpus, count word frequency and calculate word probabilities (number of times the word appears / total size of the corpus)
    • Find the word with the highest probability and choose it as the replacement

The steps of building the autocorrect model are based on the "Natural Language Processing with Probabilistic Models" course on Coursera.

0

1

Updated 2021-04-13

References


Tags

Data Science

Learn After