Ph.D. Theses

An Evaluation of a Statistical Framework and Algorithms for Adaptive Aggregation

By Brandeis Hill
Advisor: Sibel Adali
April 20, 2007

The rank aggregation problem aims to combine several ranked lists to obtain a final .consensus. ranked list that gives better results than any one of the individual ranked lists. With the emergence of the World Wide Web, meta-search community has studied the rank aggregation problem in order to aggregate search results from multiple search engines and increase the coverage of the Web by accessing more information. The existing aggregation methods address the problem of capturing the user feedback accurately while minimizing the impact of outliers or spam. However, the prior work does not provide guidelines about when to use the aggregation method in which problem setting as well as how to dynamically select an aggregation method for that problem setting. In this thesis, we address these shortcomings in the prior work. Since rank aggregation is not the only field to rank objects, we need a more general platform to examine an optimal ranking. We define two factors that contribute to the performance of rank aggregation methods including noise, such as spam, and misinformation, such as trustworthiness.

In meta-search, relevance of an object is difficult to decide since evaluation depends on subjective expert judgments. To address this concern, we propose a flexible statistical framework to model the possible different relationships between the rankers, such as search engines, and the ground truth. Our model contains a ground truth ranker, which corresponds to the correct ordering of objects, and the input rankers that serve as approximations of the ground truth ranker. We also develop several aggregation methods that capture different aspects of the rank information including precision optimal, iterative best flip and three algorithms that are approximations to the minimum feedback arc set problem. We show that there is a trade off between information and robustness when selecting the best aggregation method and that none of the well-known rankers perform well uniformly in all different noise and misinformation conditions. We also show that we can identify misinformation of the rankers and noise in which to effectively customize the search results.

Return to main PhD Theses page