outline
play

Outline Background Conditional Link Model Discriminative Content - PowerPoint PPT Presentation

Tianbao Yang 1 , Rong Jin 1 , Yun Chi 2 , Shenghuo Zhu 2 1 Michigan State University 2 NEC Laboratories America Presenter: April Hua LIU Outline Background Conditional Link Model Discriminative Content Model Optimization Algorithms


  1. Tianbao Yang 1 , Rong Jin 1 , Yun Chi 2 , Shenghuo Zhu 2 1 Michigan State University 2 NEC Laboratories America Presenter: April Hua LIU

  2. Outline  Background  Conditional Link Model  Discriminative Content Model  Optimization Algorithms  Extensions  Experiments  Conclusion

  3. Background  Community detection in network  Community:  Densely connected in links  Common topic in contents  Network data  Links between nodes: e.g. citation between papers  Content describing nodes: e.g. bag-of words for papers

  4. Background(Cont.)  Most work on community detection  Link analysis, but links are sparse and noisy  Content analysis, but content can be misleading  Combing link and content  Most are based on generative models  Link-model (PHITS)+ topic-model (PLSA)  Connected by the community memberships (hidden variable)

  5. Our contribution  Problems with existing models  Community membership is insufficient to model links  Our contribution: introduce popularity of nodes  Generative model, vulnerable to irrelevant attributes  Our contribution: discriminative content model

  6. Notations 𝒲 = *1, … , 𝑜+ nodes ℰ = *(𝑗 → 𝑘)|𝑡 𝑗𝑘 ≠ 0+ directed links ℒ𝒫 𝑗 ∈ 𝒲 link-out space of node i ℒℐ 𝑗 ∈ 𝒲 link-in space of node i 𝒫 𝑗 ∈ 𝒲 nodes cited by node i ℐ 𝑗 ∈ 𝒲 nodes cites node i 𝑨 𝑗 ∈ 1, … , 𝐿 community of node i 𝛿 𝑗 = 𝛿 𝑗1 , … , 𝛿 𝑗𝐿 community membership of node i 𝑦 𝑗 ∈ ℝ 𝑒 content vector of node i

  7. Conditional link model  Popularity-based conditional link model(PCL)  Model conditional link probability: Pr(j|i)  Probability of linking node i to node j  Popularity of node i : 𝑐 𝑗 ≥ 0  Large 𝑐 𝑗  high probability cited by other nodes 𝐿 Pr 𝑘 𝑗 = Pr 𝑨 𝑗 = 𝑙 𝑗 Pr (𝑘|𝑨 𝑗 = 𝑙) 𝑙=1 𝐿 𝛿 𝑘𝑙 𝑐 𝑘 = 𝛿 𝑗𝑙 𝛿 𝑘𝑙 𝑐 𝑘 𝑘∈ℒ𝒫(𝑗) 𝑙=1

  8. Analysis of PCL model  PCL model 𝐿 𝛿 𝑘𝑙 𝑐 𝑘 Pr(j|i) = 𝛿 𝑗𝑙 𝛿 𝑘𝑙 𝑐 𝑘 𝑘∈ℒ𝒫(𝑗) 𝑙=1 𝐿 𝛿 𝑘𝑙 𝑐 𝑘𝑙 Pr(j|i) = 𝛿 𝑗𝑙 𝛿 𝑘𝑙 𝑐 𝑘𝑙 𝑘∈ℒ𝒫(𝑗) 𝑙=1 𝐿 Pr 𝑘 𝑗 = Pr 𝑨 = 𝑙 𝑗 Pr 𝑘 𝑨 = 𝑙 = 𝛿 𝑗𝑙 𝛾 𝑘𝑙 PHITS model 𝑙 𝑙=1

  9. Maximum Likelihood Estimation  The log-likelihood:  We find optimal 𝛿, 𝑐 by maxmizing the log-likelihood

  10. Discriminative Content (DC) model  A discriminative model that determines community memberships by node contents Where 𝑥 𝑙 ∈ ℝ 𝑒 weights different content features PCL + DC 𝛿 𝑘𝑙 𝑐 𝑘 𝐿 Pr(j|i) = 𝛿 𝑗𝑙 𝛿 𝑗𝑙 = 𝑙=1 𝛿 𝑘𝑙 𝑐 𝑘 𝑘∈ℒ𝒫(𝑗)

  11. Optimization Algorithm  We maximize the log-likelihood over the free parameters w and b  EM algorithm

  12. Experiments  Data sets Data set #node #links Content Labels K Description s Political 1490 19090 No Yes 2 Blog network Blog Wikipedia 105 799 No No 20 Webpages hyperlinks Cora 2708 5429 Yes Yes 7 Paper citation Citeseer 3312 4732 Yes Yes 6 Paper citation

  13. Experiments  Performance Metrics  Supervised metrics  normalized mutual information (NMI)  pairwise F-measure (PWF)  Unsupervised metrics  modularity (Modu)  normalized cut (Ncut)

  14. Experiments: link prediction  Baselines: PHITS, PCL-b=1 (constant popularity)  Recall measure  PCL performs better than PHITS  Modeling popularity better than without modeling

  15. Experiments  Community detection on two paper citation data sets

  16. Experiments  Link model: PCL is better than PHITS  On combining link with content:  PCL + content-model performs better than link-models + content model  Link-models + DC performs better than link-model + topic-models  PCL + DC performs better than the other combination models

  17. Conclusion  A conditional link model capture popularity of nodes  A discriminative model for content analysis  A unified model to combine link and content  Link structure  noisy estimation of community memberships 𝑧 (PCL)  𝑧 used as supervised information  high-quality memberships 𝑧 (DC)  Encouraging empirical results

  18. Thanks Q&A?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend