Latent Semantic Analysis is a widely adopted technique to associate Documents with Terms
- Documents and Terms are indirectly associated with each other through “Concepts”.
- The number of Concepts is far less than the number of Documents and Terms.
- Concepts are only abstract. They may have no concrete meaning or relevance.
Latent Semantic Analysis Process
- Estimate the importance of a Term within the document. A typical metric is tf-idf
- Send the Document-Term matrix to the SVD algorithm, and pick the top K eigen values
- Prune the vectors for the Document and Term matrices to contain only the K factors
Strength of the Association
|Document-Term||The association between the Document and a Term is given by the dot-product between the Document Row and Term Column|
|Term-Term||The association between a Term and another is given by the dot-product between the First term and the Second term.|
|Document-Document||The association between a Document and another is given by the dot-product between the First document and the Second document.|