Learning to rank documents with support vector machines via active learning

Publication Year:
2009
Usage 2947
Downloads 2769
Abstract Views 178
Repository URL:
http://ir.uiowa.edu/etd/331; https://ir.uiowa.edu/cgi/viewcontent.cgi?article=1516&context=etd
Author(s):
Arens, Robert James
Publisher(s):
University of Iowa
Tags:
Active Learning; Information Retrieval; Support Vector Machines; Computer Sciences
thesis / dissertation description
Navigating through the debris of the information explosion requires powerful, flexible search tools. These tools must be both useful and useable; that is, they must do their jobs effectively without placing too many burdens on the user. While general interest search engines, such as Google, have addressed this latter challenge well, more topic-specific search engines, such as PubMed, have not. These search engines, though effective, often require training in their use, as well as in-depth knowledge of the domain over which they operate. Furthermore, search results are often returned in an order irrespective of users' preferences, forcing them to manually search through search results in order to find the documents they find most useful.To solve these problems, we intend to learn ranking functions from user relevance preferences. Applying these ranking functions to search results allows us to improve search usability without having to reengineer existing, effective search engines. Using ranking SVMs and active learning techniques, we can effectively learn what is relevant to a user from relatively small amounts of preference data, and apply these learned models as ranking functions. This gives users the convenience of seeing relevance-ordered search results, which are tailored to their preferences as opposed to using a one-size-fits-all sorting method. As giving preference feedback does not require in-depth domain knowledge, this approach is suitable for use by domain experts as well as neophytes. Furthermore, giving preference feedback does not require a great deal of training, adding very little overhead to the search process.