The notes here optimal_secretaries summarize two problems
- How to optimize one’s chance of getting the best out of a finite pool when one has to make an instant decision after every observation, and one does not have an idea of what is coming from the pool.
- How to maximize the chance of cashing in on the last occurrence of an event, and get a favorable exit from a given situation
Generalized Linear Regression
Document in MS-Word format glm_regression
This is a long post on an Intermediate level topic of interest to people working in Big-Data.
These are my notes, as I struggled to understand the topic from the available references. Unfortunately, the references, all contained the same wordings and the same areas of focus. They were deficient in some crucial areas – missing links
- To explain the Link function as the “Maximum Likelihood function” of the Original distribution
- To explain how Maximum Likelihood applied to regression – That the objective was to fit a probability distribution function, that maximized the probability that the observed independent variables would give as output the observed dependant variables
- That only a single pdf was being fit, even though the observations were in N-dimensions
Latent Semantic Analysis is a widely adopted technique to associate Documents with Terms
- Documents and Terms are indirectly associated with each other through “Concepts”.
- The number of Concepts is far less than the number of Documents and Terms.
- Concepts are only abstract. They may have no concrete meaning or relevance.
Latent Semantic Analysis Process
- Estimate the importance of a Term within the document. A typical metric is tf-idf
- Send the Document-Term matrix to the SVD algorithm, and pick the top K eigen values
- Prune the vectors for the Document and Term matrices to contain only the K factors
Strength of the Association
||The association between the Document and a Term is given by the dot-product between the Document Row and Term Column
||The association between a Term and another is given by the dot-product between the First term and the Second term.
||The association between a Document and another is given by the dot-product between the First document and the Second document.