Learn Before
Concept

Application of the tf-idf vector models

The tf-idf model can be used to compute word similarity by computing the cosine of two word vectors

The tf-idf vector models can also be used to decide if two documents are similar. For each document that we want to compare, we need to compute the centroid for all their vectors. Given kk word vectors w1,w2,....,wkw_1,w_2,....,w_k, the centroid document vector dd is:

d=w1+w2+...+wkkd = \frac{{w_1+w_2+...+w_k}}{k}

Then we can compute the cos(d1,d2)cos(d_1,d_2) to estimate the similarity of these two documents: high cosine, high similarity.

0

2

Updated 2021-10-15

Tags

Data Science