Problem Statement

  • In several applications, it is necessary to quickly determine the keywords of a document,  such as
  1. Contextual Advertising
  2. News Query Extaction
  3. Email Query Extraction
  • One technique to solve this problem is a four step process
  1. Pre-processing
  • The document is pre-processed, so that the text gets prominence and markup tags are removed
  1. Candidate Selection
  •  The candidate phrases are extracted from the text. Typically these are Nouns and Noun Phrases.
  1. Scoring
  • The features for the candidates is computed.
  • Then all the candidates are scored
  • The scoring here uses a supervised learning model, with annotators pre-annotating test pages.
  1. Post-processing
  • At the end of the scoring , all candidate solutions are listed in decreasing score.
  • Other rule-based constraints may be applied to score the items
  • The paper describes such a system used in “Contextual Advertising”.
  • The details are easy to read, and the system is simple and practical

