Crisp Reading Notes on Latest Technology Trends and Basics

keywords_on_a_page

http://www.cs.cmu.edu/~vitor/papers/www06.pdf

Problem Statement

  • In several applications, it is necessary to quickly determine the keywords of a document,  such as
  1. Contextual Advertising
  2. News Query Extaction
  3. Email Query Extraction
  • One technique to solve this problem is a four step process
  1. Pre-processing
  • The document is pre-processed, so that the text gets prominence and markup tags are removed
  1. Candidate Selection
  •  The candidate phrases are extracted from the text. Typically these are Nouns and Noun Phrases.
  1. Scoring
  • The features for the candidates is computed.
  • Then all the candidates are scored
  • The scoring here uses a supervised learning model, with annotators pre-annotating test pages.
  1. Post-processing
  • At the end of the scoring , all candidate solutions are listed in decreasing score.
  • Other rule-based constraints may be applied to score the items
  • The paper describes such a system used in “Contextual Advertising”.
  • The details are easy to read, and the system is simple and practical
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Tag Cloud

%d bloggers like this: