Effective Latent Space Graph-based Re-ranking Model with Global - - PowerPoint PPT Presentation

effective latent space graph based re ranking model with
SMART_READER_LITE
LIVE PREVIEW

Effective Latent Space Graph-based Re-ranking Model with Global - - PowerPoint PPT Presentation

WSDM 2009 Effective Latent Space Graph-based Re-ranking Model with Global Consistency Hongbo Deng, Michael R. Lyu and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong Feb. 12, 2009 1 Outline


slide-1
SLIDE 1

1

Effective Latent Space Graph-based Re-ranking Model with Global Consistency

Hongbo Deng, Michael R. Lyu and Irwin King

Department of Computer Science and Engineering The Chinese University of Hong Kong

  • Feb. 12, 2009

WSDM 2009

slide-2
SLIDE 2

2

Hongbo Deng, Michael R. Lyu and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong

WSDM 2009

Outline

Introduction Related work Methodology

  • Graph-based re-ranking model
  • Learning a latent space graph
  • A case study and the overall algorithm

Experiments Conclusions and Future Work

slide-3
SLIDE 3

3

Hongbo Deng, Michael R. Lyu and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong

WSDM 2009

Introduction

Problem definition

  • Given a set of documents D
  • A term vector di = xi
  • Relevance scores using VSM or LM
  • A connected graph
  • Explicit link (e.g., hyperlinks)
  • Implicit link (e.g., inferred from the content information)
  • Many other features
  • How to leverage the interconnection between

documents/entities to improve the ranking of retrieved results with respect to the query?

d1

d1 d2 d3 d4 d5 q d1 d2 d3 d4 d5 q

d2 d3 d4 d5

slide-4
SLIDE 4

4

Hongbo Deng, Michael R. Lyu and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong

WSDM 2009

Introduction

Initial ranking scores: relevance Graph structure: centrality (importance, authority) Simple method: Combine those two parts linearly Limitations:

  • Do not make full use of the information
  • Treat each of them individually

What we have done?

  • Propose a joint regularization framework
  • Combine the content with link information in a latent

space graph

slide-5
SLIDE 5

5

Hongbo Deng, Michael R. Lyu and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong

WSDM 2009

Related work

Using some variations of PageRank and HITS

  • Centrality within graphs (Kurland

and Lee, SIGIR’05 & SIGIR’ 06)

  • Improve Web search results using

affinity graph (Zhang et al., SIGIR’05)

  • Improve an initial ranking by

random walk in entity-relation networks (Minkov et al., SIGIR’06)

Structural re-ranking model Regularization framework Learning a latent space

Linear combination, treat the content and link individually

Structural re-ranking model Learning a latent space Regularization framework

slide-6
SLIDE 6

6

Hongbo Deng, Michael R. Lyu and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong

WSDM 2009

Related work

Using some variations of PageRank and HITS

  • Centrality within graphs (Kurland

and Lee, SIGIR’05 & SIGIR’ 06)

  • Improve Web search results using

affinity graph (Zhang et al., SIGIR’05)

  • Improve an initial ranking by

random walk in entity-relation networks (Minkov et al., SIGIR’06)

Structural re-ranking model Regularization framework Learning a latent space Structural re-ranking model

Regularization framework

  • Graph Laplacians for label propagation

(two classes) (Zhu et al., ICML’03, Zhou et al., NIPS’03)

  • Extent the graph harmonic function to

multiple classes (Mei et al., WWW’08)

  • Score regularization to adjust ad-hoc

retrieval scores (Diaz, CIKM’05)

  • Enhance learning to rank with

parameterized regularization models (Qin et al., WWW’08)

Query-independent settings Do not consider multiple relationships between objects.

Learning a latent space Regularization framework

slide-7
SLIDE 7

7

Hongbo Deng, Michael R. Lyu and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong

WSDM 2009

Use the joint factorization to learning the latent feature. Difference: leverage the latent feature for building a latent space graph.

Related work

Using some variations of PageRank and HITS

  • Centrality within graphs (Kurland

and Lee, SIGIR’05 & SIGIR’ 06)

  • Improve Web search results using

affinity graph (Zhang et al., SIGIR’05)

  • Improve an initial ranking by

random walk in entity-relation networks (Minkov et al., SIGIR’06)

Structural re-ranking model Regularization framework Learning a latent space Structural re-ranking model

Regularization framework

  • Graph Laplacians for label propagation

(two classes) (Zhu et al., ICML’03, Zhou et al., NIPS’03)

  • Extent the graph harmonic function to

multiple classes (Mei et al., WWW’08)

  • Score regularization to adjust ad-hoc

retrieval scores (Diaz, CIKM’05)

  • Enhance learning to rank with

parameterized regularization models (Qin et al., WWW’08)

Learning a latent space

  • Latent Semantic Analysis (LSA)

(Deerwester et al., JASIS’90)

  • Probabilistic LSI (pLSI) (Hofmann,

SIGIR’99)

  • pLSI + PHITS (Cohn and Hofmann,

NIPS’00)

  • Combine content and link for

classification using matrix factorization (Zhu et al., SIGIR’07)

Learning a latent space Regularization framework

slide-8
SLIDE 8

8

Hongbo Deng, Michael R. Lyu and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong

WSDM 2009

Methodology

+

Graph-based re-ranking model

Learning a latent space graph

Case study: Expert finding

Graph-based re-ranking model

slide-9
SLIDE 9

9

Hongbo Deng, Michael R. Lyu and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong

WSDM 2009

Graph-based re-ranking model

Intuition:

  • Global consistency: similar documents are most likely

to have similar ranking scores with respect to a query.

  • The initial ranking scores provides invaluable

information

Regularization framework

Global consistency Fit initial scores Parameter

  • III. Methodology
slide-10
SLIDE 10

10

Hongbo Deng, Michael R. Lyu and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong

WSDM 2009

Optimization problem A closed-form solution Connection with other methods

  • µα 0, return the initial scores
  • µα 1, a variation of PageRank-based model
  • µα ∈ (0, 1), combine both information simultaneously

Graph-based re-ranking model

  • III. Methodology
slide-11
SLIDE 11

11

Hongbo Deng, Michael R. Lyu and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong

WSDM 2009

Learning a latent space graph

Methodology

+

Graph-based re-ranking model

Case study: Expert finding

slide-12
SLIDE 12

12

Hongbo Deng, Michael R. Lyu and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong

WSDM 2009

Learning a latent space graph

Objective: incorporate the content with link information (or relational data) simultaneously

  • Latent Semantic Analysis
  • Joint factorization

Combine the content with relational data

  • Build latent space graph

Calculate the weight matrix W

  • III. Methodology
slide-13
SLIDE 13

13

Hongbo Deng, Michael R. Lyu and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong

WSDM 2009

Latent Semantic Analysis

Map documents to vector space of reduced dimensionality SVD is performed on the matrix The largest k singular values Reformulated as an optimization problem

  • III. Methodology - Learning a latent space graph
slide-14
SLIDE 14

14

Hongbo Deng, Michael R. Lyu and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong

WSDM 2009

Embedding multiple relational data

Taking the papers as an example

  • Paper-term matrix C
  • Paper-author matrix A

A unified optimization problem C

NxK

+ X VC VA

  • III. Methodology - Learning a latent space graph

A

NxM NxL Conjugate Gradient

slide-15
SLIDE 15

15

Hongbo Deng, Michael R. Lyu and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong

WSDM 2009

Embedding multiple relational data

Taking the papers as an example

  • Paper-term matrix C
  • Paper-author matrix A

A unified optimization problem C

NxK

+ X VC VA

  • III. Methodology - Learning a latent space graph

A

NxM NxL Conjugate Gradient

slide-16
SLIDE 16

16

Hongbo Deng, Michael R. Lyu and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong

WSDM 2009

Build latent space graph

The edge weight wij is defined

W

  • III. Methodology - Learning a latent space graph
slide-17
SLIDE 17

17

Hongbo Deng, Michael R. Lyu and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong

WSDM 2009

Methodology

+

Graph-based re-ranking model

Learning a latent space graph

Case study: Expert finding

slide-18
SLIDE 18

18

Hongbo Deng, Michael R. Lyu and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong

WSDM 2009

Case study: Application to expert finding

Utilize statistical language model to calculate the initial ranking scores

  • The probability of a query given a document

Infer a document model θd for each document The probability of the query generated by the document model θd The product of terms generated by the document model (Assumption: each term are independent)

  • III. Methodology
slide-19
SLIDE 19

19

Hongbo Deng, Michael R. Lyu and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong

WSDM 2009

Case study: Application to expert finding

Expert Finding:

  • Identify a list of experts in the academic field for a given

query topic (e.g., “data mining” “Jiawei Han, etc”)

  • Publications as representative of their expertise
  • Use DBLP dataset to obtain the publications

Authors have expertise in the topic of their papers

  • Overall aggregation of their publications
  • Refine the ranking scores of papers, then aggregate

the refined scores to re-rank the experts

  • III. Methodology
slide-20
SLIDE 20

20

Hongbo Deng, Michael R. Lyu and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong

WSDM 2009

Case study: Application to expert finding

  • III. Methodology
slide-21
SLIDE 21

21

Hongbo Deng, Michael R. Lyu and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong

WSDM 2009

Experiments

DBLP Collection

  • A subset of the DBLP records (15-CONF)
  • Statistics of the 15-CONF collection
slide-22
SLIDE 22

22

Hongbo Deng, Michael R. Lyu and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong

WSDM 2009

Benchmark Dataset

A benchmark dataset with 16 topics and expert lists

  • IV. Experiments
slide-23
SLIDE 23

23

Hongbo Deng, Michael R. Lyu and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong

WSDM 2009

Evaluation Metrics

Precision at rank n (P@n): Mean Average Precision (MAP): Bpref: The score function of the number of non-relevant candidates

  • IV. Experiments
slide-24
SLIDE 24

24

Hongbo Deng, Michael R. Lyu and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong

WSDM 2009

Preliminary Experiments

Evaluation results (%)

  • PRRM may not improve the performance
  • GBRM achieve the best results
  • IV. Experiments
slide-25
SLIDE 25

25

Hongbo Deng, Michael R. Lyu and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong

WSDM 2009

Details of the results

  • IV. Experiments
slide-26
SLIDE 26

26

Hongbo Deng, Michael R. Lyu and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong

WSDM 2009

Effect of parameter µα

µα 0, return the initial scores (baseline) µα 1, discard the initial scores, consider the global consistency over the graph

  • IV. Experiments
slide-27
SLIDE 27

27

Hongbo Deng, Michael R. Lyu and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong

WSDM 2009

Effect of parameter µα

Robust, achieve the best results when µα (0.5, 0.7)

  • IV. Experiments
slide-28
SLIDE 28

28

Hongbo Deng, Michael R. Lyu and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong

WSDM 2009

Effect of graph construction

Different dimensionality (kd) of the latent feature, which is used to calculate the weight matrix W

  • Become better for greater kd, because higher

dimensional space can better capture the similarities

  • kd, = 50 achieve better results than tf.idf
  • IV. Experiments
slide-29
SLIDE 29

29

Hongbo Deng, Michael R. Lyu and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong

WSDM 2009

Effect of graph construction

Different number of nearest neighbors (knn)

  • Tends to degrade a little with increasing knn
  • knn = 10 achieve the best results
  • Average processing time: increase linearly with the

increase of knn

  • IV. Experiments
slide-30
SLIDE 30

30

Hongbo Deng, Michael R. Lyu and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong

WSDM 2009

Conclusions and Future Work

Conclusions

  • Leverage the graph-based model for the query-dependent

ranking problem

  • Integrate the latent space with the graph-based re-ranking

model

  • Address expert finding task in a the academic field using the

proposed method

  • The improvement in our proposed model is promising

Future work

  • Extend our framework to consider more features
  • Apply the framework to other applications and large-scale

dataset

slide-31
SLIDE 31

31

Hongbo Deng, Michael R. Lyu and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong

WSDM 2009

Q&A

Thanks!