- In several applications, it is necessary to quickly determine the keywords of a document, such as
- Contextual Advertising
- News Query Extaction
- Email Query Extraction
- One technique to solve this problem is a four step process
- The document is pre-processed, so that the text gets prominence and markup tags are removed
- Candidate Selection
- The candidate phrases are extracted from the text. Typically these are Nouns and Noun Phrases.
- The features for the candidates is computed.
- Then all the candidates are scored
- The scoring here uses a supervised learning model, with annotators pre-annotating test pages.
- At the end of the scoring , all candidate solutions are listed in decreasing score.
- Other rule-based constraints may be applied to score the items
- The paper describes such a system used in “Contextual Advertising”.
- The details are easy to read, and the system is simple and practical