Learn Before
Concept

Latent Semantic Analysis

LSA is a well-known and widely used language model in natural language processing area. LSA assumes that words that are close in meaning will occur in similar pieces of text (the distributional hypothesis). A matrix containing word counts per document (rows represent unique words and columns represent each document) is constructed from a large piece of text and a mathematical technique called singular value decomposition (SVD) is used to reduce the number of rows while preserving the similarity structure among columns. Documents are then compared by taking the cosine of the angle between the two vectors (or the dot product between the normalizations of the two vectors) formed by any two columns. Values close to 1 represent very similar documents while values close to 0 represent very dissimilar documents. It's the basic method of Glove and many other modern methods of word embedding.

0

2

Updated 2021-03-15

Tags

Data Science

Learn After