Learn Before
Concept

Words as vectors: document dimensions

We’ve seen that documents can be represented as vectors in a vector space. But vector semantics can also be used to represent the meaning of words. We do this row vector by associating each word with a word vector— a row vector rather than a column vector, hence with different dimensions, as shown in Fig. below. The four dimensions of the vector for fool, [36,58,1,4], correspond to the four Shakespeare plays. Word counts in the same four dimensions are used to form the vectors for the other 3 words: wit, [20,15,2,3]; battle, [1,0,7,13]; and good [114,80,62,89].

For documents, we saw that similar documents had similar vectors, because similar documents tend to have similar words. This same principle applies to words: similar words have similar vectors because they tend to occur in similar documents. The term-document matrix thus lets us represent the meaning of a word by the documents it tends to occur in.

Image 0

0

1

Updated 2021-10-09

Contributors are:

Who are from:

Tags

Data Science