The notes here optimal_secretaries summarize two problems
- How to optimize one’s chance of getting the best out of a finite pool when one has to make an instant decision after every observation, and one does not have an idea of what is coming from the pool.
- How to maximize the chance of cashing in on the last occurrence of an event, and get a favorable exit from a given situation
- In several applications, it is necessary to quickly determine the keywords of a document, such as
- Contextual Advertising
- News Query Extaction
- Email Query Extraction
- One technique to solve this problem is a four step process
- The document is pre-processed, so that the text gets prominence and markup tags are removed
- Candidate Selection
- The candidate phrases are extracted from the text. Typically these are Nouns and Noun Phrases.
- The features for the candidates is computed.
- Then all the candidates are scored
- The scoring here uses a supervised learning model, with annotators pre-annotating test pages.
- At the end of the scoring , all candidate solutions are listed in decreasing score.
- Other rule-based constraints may be applied to score the items
- The paper describes such a system used in “Contextual Advertising”.
- The details are easy to read, and the system is simple and practical