Crisp Reading Notes on Latest Technology Trends and Basics

Latent Semantic Analysis is a widely adopted technique to associate Documents with Terms

Concept Space

  • Documents and Terms are indirectly associated with each other through “Concepts”.
  • The number of Concepts is far less than the number of Documents and Terms.
  • Concepts are only abstract. They may have no concrete meaning or relevance.

Latent Semantic Analysis Process

  1. Estimate the importance of a Term within the document. A typical metric is tf-idf
  2. Send the Document-Term matrix to the SVD algorithm, and pick the top K eigen values
  3. Prune the vectors for the Document and Term matrices to contain only the K factors

Strength of the Association

Document-Term The association between the Document and a Term is given by the dot-product between the Document Row and Term Column
Term-Term The association between a Term and another is given by the dot-product between the First term and the Second term.
Document-Document The association between a Document and another is given by the dot-product between the First document and the Second document.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Tag Cloud

%d bloggers like this: