Crisp Reading Notes on Latest Technology Trends and Basics

Archive for December, 2012

Optimal Stopping Time

The notes here optimal_secretaries summarize two problems

  1. How to optimize one’s chance of getting the best out of a finite pool when one has to make an instant decision after every observation, and one does not have an idea of what is coming from the pool.
  2. How to maximize the chance of cashing in on the last occurrence of an  event, and get a favorable exit from a given situation

References

  1. http://en.wikipedia.org/wiki/Secretary_problem
  2. http://en.wikipedia.org/wiki/Odds_algorithm

Chi-Square Distribution and Tests

Overview

  • Chi-square tests are a measure of how far a probability distribution is from the expected distribution
  • This has two common use cases
    • To estimate how good is a fit of the population whose sample is extracted to a theoretical distribution
    • How independent are two  variables by observing the deviation of the actual samples from the expected distributions
  • Beyond the theory, this is a very simple and straight-forward formula, that can be used effectively

Recognizing Keywords from Documents

keywords_on_a_page

http://www.cs.cmu.edu/~vitor/papers/www06.pdf

Problem Statement

  • In several applications, it is necessary to quickly determine the keywords of a document,  such as
  1. Contextual Advertising
  2. News Query Extaction
  3. Email Query Extraction
  • One technique to solve this problem is a four step process
  1. Pre-processing
  • The document is pre-processed, so that the text gets prominence and markup tags are removed
  1. Candidate Selection
  •  The candidate phrases are extracted from the text. Typically these are Nouns and Noun Phrases.
  1. Scoring
  • The features for the candidates is computed.
  • Then all the candidates are scored
  • The scoring here uses a supervised learning model, with annotators pre-annotating test pages.
  1. Post-processing
  • At the end of the scoring , all candidate solutions are listed in decreasing score.
  • Other rule-based constraints may be applied to score the items
  • The paper describes such a system used in “Contextual Advertising”.
  • The details are easy to read, and the system is simple and practical

Tag Cloud