QUINT: On Query-Specific Optimal Networks Presenter: Liangyue Li - - PowerPoint PPT Presentation

quint on query specific optimal networks
SMART_READER_LITE
LIVE PREVIEW

QUINT: On Query-Specific Optimal Networks Presenter: Liangyue Li - - PowerPoint PPT Presentation

QUINT: On Query-Specific Optimal Networks Presenter: Liangyue Li Joint work with Jie Tang Hanghang Tong Yuan Yao Wei Fan (Tsinghua) (ASU) (NJU) (Baidu) - 1 - Arizona State University Node Proximity: What? Node proximity : the


slide-1
SLIDE 1

Arizona State University

QUINT: On Query-Specific Optimal Networks

Presenter: Liangyue Li Joint work with

  • 1 -

Yuan Yao (NJU) Jie Tang (Tsinghua) Hanghang Tong (ASU) Wei Fan (Baidu)

slide-2
SLIDE 2

Arizona State University

Node Proximity: What?

§ Node proximity: the closeness (a.k.a.,

relevance, or similarity) between two nodes

  • 2 -

1 4 3 2 5 6 7 9 10 8 11 12 0.13 0.10 0.13 0.13 0.05 0.05 0.08 0.04 0.02 0.04 0.03

What is the closest node to 4?

slide-3
SLIDE 3

Arizona State University

Node Proximity: Why?

  • 3 -

Biology [Ni+] Social Network [Lerman+] E-commerce [Chen+] Disaster Mgtm [Zheng+]

slide-4
SLIDE 4

Arizona State University

Node Proximity: How?

§ Random Walk with Restart (RWR)

– Idea: summarize multiple weighted

relationships btw nodes

– Variants:

  • Electric networks: SAEC[Faloutsos+]
  • Katz [Katz], [Huang+]
  • Matrix-Forest-based Alg [Chobotarev+]
  • 4 -

A B H 1 1 D 1 1 E F G 1 1 1 I J 1 1 1

Prox (A, B) = Score (Red Path) + Score (Green Path) + Score (Blue Path) + Score (Purple Path) + …

slide-5
SLIDE 5

Arizona State University

Node Proximity: RWR

  • 5 -

1 4 3 2 5 6 7 9 10 8 11 12

slide-6
SLIDE 6

Arizona State University

Node Proximity -- RWR

§ Detail: a random walker starts from s

– (a) transmit to one neighbor with – (b) go back to s with prob

§ Formulation § Assumption

– How to best leverage the fixed input graph

  • 6 -

(1 − c)

rs = cArs + (1 − c)es

Ranking vector Adjacent matrix Restart prob Starting vector

p ∼ cAij A

slide-7
SLIDE 7

Arizona State University

Node Proximity: Learning RWR

§ Goal

– Use side information to learn better graph – Side info: user feedback, node attributes

§ Key Idea: Infer optimal edge weights § Limitation: Fixed topology

  • 7 -
  • J. Tang, T. Lou and J. Kleinberg. Transfer Link Prediction across Heterogeneous Social Networks.

TOIS, 2015.

  • L. Backstrom and J. Leskovec. Supervised random walks: predicting and recommending links in

social networks. WSDM, 2011.

  • A. Agarwal, S. Chakrabarti, and S. Aggarwal. Learning to rank networked entities. KDD, 2006.

min

w kwk2 + λ

X

x∈P,y∈N

h(Q(y, s) Q(x, s))

Q = (I − cA)−1

Map edge attributes to weights Match user preferences

slide-8
SLIDE 8

Arizona State University

Algorithmic Questions

§ Q1: optimal weights or optimal topology? § Q2: one-fits-all or one-fits-one? § Q3: offline learning or online learning?

  • 8 -
slide-9
SLIDE 9

Arizona State University

Q1: Optimal Weights or Topology?

§ Observation: real network is noisy and

incomplete

§ Challenge: learn optimal weights and

topology

  • 9 -

1 4 3 2 5 6 7 9 10 8 11 12 0.13 0.10 0.13 0.13 0.05 0.05 0.08 0.04 0.02 0.04 0.03

Missing edge Noisy edge

slide-10
SLIDE 10

Arizona State University

Q2: One-fits-all, or one-fits-one?

  • 10 -

§ Observation: optimal network for different

queries might be different

§ Challenge:

– How to tailor learning for each query

1 4 3 2 5 6 7 9 10 8 11 12

Positive Nodes

Negative Nodes

Query Node

P

N

1 4 3 2 5 6 7 9 10 8 11 12

Negative Nodes

Positive Nodes

Query Node

P

N

slide-11
SLIDE 11

Arizona State University

Q3: Offline or Online Learning

§ Observation:

– Learning RWR: costly iterative sub-routine to

compute a single gradient vector

– Learning topology: parameter space expands to – One-fits-one: one optimal network for each query

§ Challenge:

– How to perform query-specific online learning?

  • 11 -

O(n2)

slide-12
SLIDE 12

Arizona State University

Query-specific Optimal Network Learning

  • 12 -

1 4 3 2 5 6 7 9 10 8 11 12

Positive Nodes Negative Nodes Query Node

Given: An input network , a query node , positive nodes and negative nodes Learn: An optimal network specific to the query

P

N

s

s

P

N

A As

slide-13
SLIDE 13

Arizona State University

Roadmap

§ Motivations § Proposed Solutions: QUINT § Empirical Evaluations § Conclusions

  • 13 -
slide-14
SLIDE 14

Arizona State University

QUINT - Formulations

§ Optimization Formulation (hard version) § Remarks

– Larger parameter space – Query-specific Optimal Network – No exception is allowed in the constraint

  • 14 -

Matching Input Network Positive nodes Negative nodes Matching Preference(hard)

O(n2)

arg min

As

kAs Ak2

F

s.t., Q(x, s) > Q(y, s), 8x 2 P, 8y 2 N

Q = (I − cA)−1

slide-15
SLIDE 15

Arizona State University

QUINT - Formulations

§ Optimization Formulation (soft version) § Remarks

– Characteristic – Wilcoxon-Mann-Whitney (WMW) loss

  • 15 -

Loss function

Q(y, s) < Q(x, s) ⇒ g(·) = 0 Q(y, s) > Q(x, s) ⇒ g(·) > 0

arg min

As L(As)

= λkAs Ak2

F

+ P

x∈P,y∈N

g(Q(y, s) Q(x, s))

Q = (I − cA)−1

Penalty to the violation of preferences

slide-16
SLIDE 16

Arizona State University

QUINT -- Optimization

§ Gradient Descent Based Solution

– Gradient – Derivative of an Inverse

  • 16 -

∂L(As) ∂As

= 2λ(As − A) + P

x∈P,y∈N ∂g(Q(y,s)−Q(x,s)) ∂As

= 2λ(As − A) + P

x,y ∂g(dyx) ∂dyx ( ∂Q(y,s) ∂As

− ∂Q(x,s)

∂As

)

∂Q ∂As(i,j) = −Q ∂(I−cAs) ∂As(i,j) Q = cQJijQ

∂Q(x, s) ∂As(i, j) = cQ(x, i)Q(j, s)

Differentiable

Q = (I − cA)−1

slide-17
SLIDE 17

Arizona State University

QUINT -- Optimization

§ Intuition § Complexity § Observation

– Usually – Complexity: quadratic

  • 17 -

s

x

i j

Query node Positive node

∂Q(x, s) ∂As(i, j)

Q(j, s) × Q(x, i)

∝ Neighbor of Neighbor of

s

x

∂Q(x, s) ∂As(i, j) = cQ(x, i)Q(j, s)

O(T1|P| · |N|(T2m + n2))

T1, T2, |P|, |N| ⌧ m, n

Q: how to scale up?

Q = (I − cA)−1

slide-18
SLIDE 18

Arizona State University

QUINT – Scale-up

§ Key idea: Optimal network is rank-one

perturbation to original network

§ Details: § Optimization: alternating gradient descent § Complexity:

  • 18 -

arg min

f,g L(f, g)

= λkfg0k2

F + β(kfk2 + kgk2)

+ P

x2P,y2N

g(Q(y, s) Q(x, s))

O(T1|P| · |N|(T2m + n))

Q = (I − cA)−1

slide-19
SLIDE 19

Arizona State University

QUINT – Variant #1

§ Key idea: apply Taylor Approximation for § Details: § Complexity: using 1st order Taylor § Benefit: accessing faster

  • 19 -

Q = (I − cA)−1 ≈ I + Pk

i=1 ckAk

Q O(T1|P| · |N|n) Q(i, j)

slide-20
SLIDE 20

Arizona State University

QUINT – Variant #2

§ Key idea: Only update neighborhood of

the query node and the pos/neg nodes (Localized Rank-One Perturbation)

§ Complexity § Benefit: usually sub-linear to n

  • 20 -

O(T1|P| · |N| max(|N(s)|, |N(P, N)|)) N(s) :Neighbors of s N(P, N) : Neighbors of pos/neg nodes max(|N(s)|, |N(P, N)|) ⌧ n

slide-21
SLIDE 21

Arizona State University

Roadmap

§ Motivations § Proposed Solutions: QUINT § Empirical Evaluations § Conclusions

  • 21 -
slide-22
SLIDE 22

Arizona State University

Datasets

  • 22 -

10+ diverse networks

slide-23
SLIDE 23

Arizona State University

Effectiveness: MAP (Higher is better)

  • 23 -

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Astro-Ph GR-QC Hep-TH Hep-PH Protein Airport Oregon NBA Email Gene Last.fm

Admic/Adar Common Nbr SRW RWR wiZAN_Dual ProSIN QUINT-Basic QUINT-Basic1st QUINT-rankOne

MAP: Mean Average Precision

slide-24
SLIDE 24

Arizona State University

Effectiveness: HLU (Higher is better)

  • 24 -

10 20 30 40 50 60 70 80 90 Astro-Ph GR-QC Hep-TH Hep-PH Protein Airport Oregon NBA Email Gene Last.fm

HLU: Half-life Utility

slide-25
SLIDE 25

Arizona State University

Effectiveness: AUC (Higher is better)

  • 25 -

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Astro-Ph GR-QC Hep-TH Hep-PH Protein Airport Oregon NBA Email Gene Last.fm

slide-26
SLIDE 26

Arizona State University

Effectiveness: Precision@20 (Higher is better)

  • 26 -

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Astro-Ph GR-QC Hep-TH Hep-PH Protein Airport Oregon NBA Email Gene Last.fm

slide-27
SLIDE 27

Arizona State University

Effectiveness: Recall@5 (Higher is better)

  • 27 -

0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 Astro-Ph GR-QC Hep-TH Hep-PH Protein Airport Oregon NBA Email Gene Last.fm

slide-28
SLIDE 28

Arizona State University

Effectiveness: MPR (Lower is better)

  • 28 -

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Astro-Ph GR-QC Hep-TH Hep-PH Protein Airport Oregon NBA Email Gene Last.fm

MPR: Mean Percentile Ranking

slide-29
SLIDE 29

Arizona State University

Efficiency -- Twitter

5 10 15 x 108 0.16 0.18 0.2 0.22 0.24 0.26 0.28 0.3 0.32 0.34

# Edges Running Time (second)

QUINT−rankOne

5 10 15 x 10

8

10

−1

10 10

1

10

2

10

3

# Edges Running Time (second)

QUINT−Basic1st QUINT−rankOne

  • 29 -
0.5 1 1.5 2 2.5 3 3.5 4 x 107 0.16 0.18 0.2 0.22 0.24 0.26 0.28 0.3 0.32 0.34

# Nodes Running Time (second)

QUINT−rankOne

0.5 1 1.5 2 2.5 3 3.5 4 x 10

7

10

−1

10 10

1

10

2

10

3

# Nodes Running Time (second)

QUINT−Basic1st QUINT−rankOne

×107 ×108

QUINT-rankOne scales sub-linearly

1s

slide-30
SLIDE 30

Arizona State University

Roadmap

§ Motivations § Proposed Solutions: QUINT § Empirical Evaluations § Conclusions

  • 30 -
slide-31
SLIDE 31

Arizona State University

Conclusion: QUINT

§ Goals: Learn Optimal network (for Node Proximity) § Algorithms: VERY efficient way to compute

– Rank-1 approx + Taylor approx + local search

§ Results:

– consistently better on 10+ networks & 6 metrics – sublinear scalability, near real-time response on billion-

scale networks

  • 31 -

s

x i j

Query node Positive node

∂Q(x, s) ∂As(i, j) Q(j, s) × Q(x, i)

∝ Neighbor of Neighbor of

s x

Q1 Q2 Q3 Existing Optimal weights One-fit-all

  • ffline

QUINT Optimal topology One-fit-one

  • nline

∂Q(x, s) ∂As(i, j)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Astro-Ph GR-QC Hep-TH Hep-PH Protein Airport Oregon NBA Email Gene Last.fm

Admic/Adar Common Nbr SRW RWR wiZAN_Dual ProSIN QUINT-Basic QUINT-Basic1st QUINT-rankOne

5 10 15 x 108 0.16 0.18 0.2 0.22 0.24 0.26 0.28 0.3 0.32 0.34 # Edges Running Time (second) QUINT−rankOne 5 10 15 x 10 8 10 −1 10 10 1 10 2 10 3

# Edges Running Time (second)

QUINT−Basic1st QUINT−rankOne