Laboratory
Efficient Diversification of Web Search Results
- G. Capannini, F. M. Nardini, R. Perego, and F. Silvestri
ISTI-CNR, Pisa, Italy
Efficient Diversification of Web Search Results G. Capannini, F. M. - - PowerPoint PPT Presentation
Efficient Diversification of Web Search Results G. Capannini, F. M. Nardini, R. Perego, and F. Silvestri ISTI-CNR, Pisa, Italy Laboratory Web Search Results Diversification Query: Vinci, what is the users intent? Information
Laboratory
ISTI-CNR, Pisa, Italy
VLDB 2011 - Aug/Sept 2011, Seattle, US
Vinci?
Vinci, the small village in Tuscany?
Vinci, the company?
2
VLDB 2011 - Aug/Sept 2011, Seattle, US
Vinci?
Vinci, the small village in Tuscany?
Vinci, the company?
2
VLDB 2011 - Aug/Sept 2011, Seattle, US
intents it represents
representing how much that intent is represented by that document (e.g., 1/2 of document D is related to the intent of “digital photography techniques”)
intent weight. i.e., maximize the number of satisfied users.
3
VLDB 2011 - Aug/Sept 2011, Seattle, US
Proceedings of the Second ACM International Conference on Web Search and Data Mining (WSDM '09), Ricardo Baeza- Yates, Paolo Boldi, Berthier Ribeiro-Neto, and B. Barla Cambazoglu (Eds.). ACM, New York, NY, USA, 5-14.
result diversification. In Proceedings of the 19th International Conference on World Wide Web, pages 881-890, Raleigh, NC, USA, 2010. ACM.
4
VLDB 2011 - Aug/Sept 2011, Seattle, US
5
VLDB 2011 - Aug/Sept 2011, Seattle, US
5
intents
VLDB 2011 - Aug/Sept 2011, Seattle, US
5
intents the weight
VLDB 2011 - Aug/Sept 2011, Seattle, US
5
intents the weight is the probability of being relative to intent c
VLDB 2011 - Aug/Sept 2011, Seattle, US
5
intents the weight is the probability of being relative to intent c d is not pertinent to c
VLDB 2011 - Aug/Sept 2011, Seattle, US
5
intents the weight is the probability of being relative to intent c d is not pertinent to c no doc is pertinent to c
VLDB 2011 - Aug/Sept 2011, Seattle, US
5
intents the weight is the probability of being relative to intent c d is not pertinent to c no doc is pertinent to c at least one doc is pertinent to c
VLDB 2011 - Aug/Sept 2011, Seattle, US
result with the max marginal utility.
6
VLDB 2011 - Aug/Sept 2011, Seattle, US
result with the max marginal utility.
6
VLDB 2011 - Aug/Sept 2011, Seattle, US
final results set.
covered:
7
VLDB 2011 - Aug/Sept 2011, Seattle, US
8
VLDB 2011 - Aug/Sept 2011, Seattle, US
8
VLDB 2011 - Aug/Sept 2011, Seattle, US
8
Same problem as before... It may not diversify!
VLDB 2011 - Aug/Sept 2011, Seattle, US
9
VLDB 2011 - Aug/Sept 2011, Seattle, US
9
Vinci
VLDB 2011 - Aug/Sept 2011, Seattle, US
9
Vinci Leonardo da Vinci Vinci Town
Vinci Group
VLDB 2011 - Aug/Sept 2011, Seattle, US
9
Vinci Leonardo da Vinci Vinci Town
Vinci Group
5/12 1/4 1/3
VLDB 2011 - Aug/Sept 2011, Seattle, US
9
Vinci Leonardo da Vinci Vinci Town
Vinci Group
5/12 1/4 1/3
Rq S
VLDB 2011 - Aug/Sept 2011, Seattle, US
9
Vinci Leonardo da Vinci Vinci Town
Vinci Group
5/12 1/4 1/3
Rq S
VLDB 2011 - Aug/Sept 2011, Seattle, US
10
VLDB 2011 - Aug/Sept 2011, Seattle, US
documents by using a sort-based approach.
11
VLDB 2011 - Aug/Sept 2011, Seattle, US
12
VLDB 2011 - Aug/Sept 2011, Seattle, US
12
VLDB 2011 - Aug/Sept 2011, Seattle, US
have the set of specialization available for each query.
based.
to obtain a set of queries from which Sq is built by including the most popular (i.e., freq. in query log > f(q) / s) recommendations:
13
Generating Suggestions for Queries in the Long Tail with an Inverted Index Information Processing & Management, August 2011
VLDB 2011 - Aug/Sept 2011, Seattle, US
14
VLDB 2011 - Aug/Sept 2011, Seattle, US
15
VLDB 2011 - Aug/Sept 2011, Seattle, US
15
VLDB 2011 - Aug/Sept 2011, Seattle, US
8Gb of RAM and Ubuntu Linux 9.10 (kernel 2.6.31-22).
16
VLDB 2011 - Aug/Sept 2011, Seattle, US
17
VLDB 2011 - Aug/Sept 2011, Seattle, US
17
VLDB 2011 - Aug/Sept 2011, Seattle, US
17
VLDB 2011 - Aug/Sept 2011, Seattle, US
18
VLDB 2011 - Aug/Sept 2011, Seattle, US
view
document scoring phase (See DDR2011 paper)
19
VLDB 2011 - Aug/Sept 2011, Seattle, US
Franco Maria Nardini ISTI-CNR, Pisa Italy http://hpc.isti.cnr.it/~nardini f.nardini@isti.cnr.it
20
21
VLDB 2011 - Aug/Sept 2011, Seattle, US
22
tuning parameter α.
rewarded, and this metric is equivalent to the traditional NDCG.
Vechtomova, A. Ashkan, S. Bu ̈ttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In Proc. SIGIR’08, pages 659–666. ACM, 2008.
VLDB 2011 - Aug/Sept 2011, Seattle, US
22
tuning parameter α.
rewarded, and this metric is equivalent to the traditional NDCG.
Vechtomova, A. Ashkan, S. Bu ̈ttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In Proc. SIGIR’08, pages 659–666. ACM, 2008.
VLDB 2011 - Aug/Sept 2011, Seattle, US
Diversifying search results. In Proceedings of the Second ACM International Conference
Web Search and Data Mining (WSDM '09), Ricardo Baeza-Yates, Paolo Boldi, Berthier Ribeiro-Neto, and B. Barla Cambazoglu (Eds.). ACM, New York, NY, USA, 5-14.
23
VLDB 2011 - Aug/Sept 2011, Seattle, US
Diversifying search results. In Proceedings of the Second ACM International Conference
Web Search and Data Mining (WSDM '09), Ricardo Baeza-Yates, Paolo Boldi, Berthier Ribeiro-Neto, and B. Barla Cambazoglu (Eds.). ACM, New York, NY, USA, 5-14.
23