Combining Implicit and Explicit Topic Representations for Result Diversification
Jiyin He, Vera Hollink, Arjen de Vries Centrum Wiskunde & Informatica SIGIR 2012, Portland
1
Combining Implicit and Explicit Topic Representations for Result - - PowerPoint PPT Presentation
Combining Implicit and Explicit Topic Representations for Result Diversification Jiyin He, Vera Hollink, Arjen de Vries Centrum Wiskunde & Informatica SIGIR 2012, Portland 1 Subtopics in result diversification Python 2 Implicit
Jiyin He, Vera Hollink, Arjen de Vries Centrum Wiskunde & Informatica SIGIR 2012, Portland
1
2
3
python
edit pythonidae
species
family islands
australia prey
eggs geographic guinea including known snakes
accessed boidae common females fitzinger indonesia isbn larger molurus pp related search southern world
asia links
python
modules
function interpreter language lists
standard
class data error exceptions file library
programming read references statements strings
argument documentation feature interactive
tools tutorial
formatting previous source syntax4
anchor texts
5
multiple sources
6
GA
GB
GC
7
GA GB GC
Within plane: Between plane:
Assumption: the more similar two topics are, the more likely a transition can happen.
documents
8
9
10
Source Nodes Edge weights Data Click log (GC)1 search queries #co-clicked documents MSN query log Anchor texts(GA)2 anchor texts #co-occurrence in text passages Anchor texts from ClueWeb09 Ngrams(GN)3 Web ngrams #co-occurrence in text passages Bing Ngram service
1 Radlinski et al., 2010; Guo et al., 2011; 2 Dang et al., 2010; 2 , 3Dang et al., 2011
11
Sample subtopic Top 3 related subtopics
anti-spy windows defender 0.2261 microsoft antispyware 0.1208 defender 0.1122 microsoft spyware windows defender 0.2263 microsoft antispyware 0.1208 defender 0.1121 antispyware windows defender 0.2265 microsoft antispyware 0.1207 defender 0.1121 microsoft beta windows defender 0.226microsoft antispyware 0.1209 defender 0.112 windows defender microsoft antispyware 0.1218 defender 0.1141 antispyware 0.0995 space defender 1.0 star defender 4 0.1266 star defender 3 0.1266 star defender 2 0.1266 defender industries defender industries Inc 0.2055 defender 0.1197 windows defender 0.0462 microsoft beta windows defender 0.1062 microsoft defender 0.0555 microsoft s windows defender 0.0538 a public defender public defender 0.116public defender’s
0.104office of the public defender 0.104 tri state defender chicago defender 0.1035 the chicago defender 0.1035 national legal aid defender association 0.0352 A random sample of 5 subtopics related to the query “defender” from 1 source (top) vs. 2 sources (bottom) and the top 3 subtopics related to each of the sample subtopics. The scores are the result of a 5-step random walk on the corresponding graphs.
form better topic models?
their combinations compare in terms of diversification performance?
resources achieve better diversification performance than that of single resources?
12
13
14
Graph Coverage
1-50 51-100 101-150 GC 39 37 21 GA 48 47 25 GN 48 45 34 GCA 48 48 31 GCN 50 48 39 GAN 50 48 39 GCAN 50 48 39
does not provide any information
15
Topics 1-50
subtopics often helps
different cases
# Topics (K) # Topics (K)
Topics 51-100 Topics 101-150
16
Topics 1-50 Topics 51-100 Topics 101-150
does not always lead to
# Topics (K) # Topics (K) # Topics (K)
17
Topics 1-50 Topics 51-100 Topics 101-150
random K, diversification with
that of pLSA
that of the worst individual source
# Topics (K) # Topics (K) # Topics (K)
18