Outline Background Conditional Link Model Discriminative Content - - PowerPoint PPT Presentation

outline
SMART_READER_LITE
LIVE PREVIEW

Outline Background Conditional Link Model Discriminative Content - - PowerPoint PPT Presentation

Tianbao Yang 1 , Rong Jin 1 , Yun Chi 2 , Shenghuo Zhu 2 1 Michigan State University 2 NEC Laboratories America Presenter: April Hua LIU Outline Background Conditional Link Model Discriminative Content Model Optimization Algorithms


slide-1
SLIDE 1

Tianbao Yang1, Rong Jin1, Yun Chi2, Shenghuo Zhu2

1Michigan State University 2 NEC Laboratories America

Presenter: April Hua LIU

slide-2
SLIDE 2

Outline

 Background  Conditional Link Model  Discriminative Content Model  Optimization Algorithms  Extensions  Experiments  Conclusion

slide-3
SLIDE 3

Background

 Community detection in network

 Community:

 Densely connected in links  Common topic in contents

 Network data

 Links between nodes: e.g. citation between papers  Content describing nodes: e.g. bag-of words for papers

slide-4
SLIDE 4

Background(Cont.)

 Most work on community detection

 Link analysis, but links are sparse and noisy  Content analysis, but content can be misleading

 Combing link and content

 Most are based on generative models

 Link-model (PHITS)+ topic-model (PLSA)  Connected by the community memberships (hidden variable)

slide-5
SLIDE 5

Our contribution

 Problems with existing models

 Community membership is insufficient to model links

 Our contribution: introduce popularity of nodes

 Generative model, vulnerable to irrelevant attributes

 Our contribution: discriminative content model

slide-6
SLIDE 6

Notations

𝒲 = *1, … , 𝑜+ nodes ℰ = *(𝑗 → 𝑘)|𝑡𝑗𝑘 ≠ 0+ directed links ℒ𝒫 𝑗 ∈ 𝒲 link-out space of node i ℒℐ 𝑗 ∈ 𝒲 link-in space of node i 𝒫 𝑗 ∈ 𝒲 nodes cited by node i ℐ 𝑗 ∈ 𝒲 nodes cites node i 𝑨𝑗 ∈ 1, … , 𝐿 community of node i 𝛿𝑗 = 𝛿𝑗1, … , 𝛿𝑗𝐿 community membership of node i 𝑦𝑗 ∈ ℝ𝑒 content vector of node i

slide-7
SLIDE 7

Conditional link model

 Popularity-based conditional link model(PCL)

 Model conditional link probability: Pr(j|i)

 Probability of linking node i to node j  Popularity of node i : 𝑐𝑗 ≥ 0

 Large 𝑐𝑗high probability cited by other nodes

Pr 𝑘 𝑗 = Pr 𝑨𝑗 = 𝑙 𝑗 Pr (𝑘|𝑨𝑗 = 𝑙)

𝐿 𝑙=1

= 𝛿𝑗𝑙 𝛿𝑘𝑙𝑐

𝑘

𝛿𝑘𝑙𝑐

𝑘 𝑘∈ℒ𝒫(𝑗) 𝐿 𝑙=1

slide-8
SLIDE 8

Analysis of PCL model

 PCL model

Pr 𝑘 𝑗 = Pr 𝑨 = 𝑙 𝑗 Pr 𝑘 𝑨 = 𝑙 = 𝛿𝑗𝑙𝛾𝑘𝑙

𝑙 𝐿 𝑙=1

Pr(j|i) = 𝛿𝑗𝑙 𝛿𝑘𝑙𝑐

𝑘

𝛿𝑘𝑙𝑐

𝑘 𝑘∈ℒ𝒫(𝑗) 𝐿 𝑙=1

Pr(j|i) = 𝛿𝑗𝑙 𝛿𝑘𝑙𝑐

𝑘𝑙

𝛿𝑘𝑙𝑐

𝑘𝑙 𝑘∈ℒ𝒫(𝑗) 𝐿 𝑙=1

PHITS model

slide-9
SLIDE 9

Maximum Likelihood Estimation

 The log-likelihood:  We find optimal 𝛿, 𝑐 by maxmizing the log-likelihood

slide-10
SLIDE 10

Discriminative Content (DC) model

 A discriminative model that determines community

memberships by node contents

Where 𝑥𝑙 ∈ ℝ𝑒 weights different content features

PCL + DC

Pr(j|i) = 𝛿𝑗𝑙

𝛿𝑘𝑙𝑐𝑘 𝛿𝑘𝑙𝑐𝑘

𝑘∈ℒ𝒫(𝑗)

𝐿 𝑙=1

𝛿𝑗𝑙 =

slide-11
SLIDE 11

Optimization Algorithm

 We maximize the log-likelihood over the free

parameters w and b

 EM algorithm

slide-12
SLIDE 12

Experiments

 Data sets

Data set #node s #links Content Labels K Description Political Blog 1490 19090 No Yes 2 Blog network Wikipedia 105 799 No No 20 Webpages hyperlinks Cora 2708 5429 Yes Yes 7 Paper citation Citeseer 3312 4732 Yes Yes 6 Paper citation

slide-13
SLIDE 13

Experiments

 Performance Metrics

 Supervised metrics

 normalized mutual information (NMI)  pairwise F-measure (PWF)

 Unsupervised metrics

 modularity (Modu)  normalized cut (Ncut)

slide-14
SLIDE 14

Experiments: link prediction

 Baselines: PHITS, PCL-b=1 (constant popularity)  Recall measure  PCL performs better than PHITS  Modeling popularity better than without modeling

slide-15
SLIDE 15

Experiments

 Community detection on two paper citation data sets

slide-16
SLIDE 16

Experiments

 Link model: PCL is better than PHITS  On combining link with content:

 PCL + content-model performs better than link-models

+ content model

 Link-models + DC performs better than link-model +

topic-models

 PCL + DC performs better than the other combination

models

slide-17
SLIDE 17

Conclusion

 A conditional link model capture popularity of nodes  A discriminative model for content analysis  A unified model to combine link and content

 Link structure  noisy estimation of community

memberships 𝑧 (PCL)

 𝑧

used as supervised information  high-quality memberships 𝑧 (DC)

 Encouraging empirical results

slide-18
SLIDE 18

Thanks Q&A?