the ground truth about metadata and community detection in
play

The ground truth about metadata and community detection in 8 8 7 - PowerPoint PPT Presentation

The ground truth about metadata and community detection in 8 8 7 7 8 8 networks 5 5 0 0 . . 8 8 0 0 6 6 1 1 : : Leto Peel v v i i X X Universit catholique de Louvain r r a a Community detectjon: Split nodes


  1. The ground truth about metadata and community detection in 8 8 7 7 8 8 networks 5 5 0 0 . . 8 8 0 0 6 6 1 1 : : Leto Peel v v i i X X Université catholique de Louvain r r a a

  2. Community detectjon: Split nodes into groups based 8 on their patuern of links 7 8 5 0 . 8 0 6 1 : v i X r a

  3. Data generatjng process: 8 7 Generate nodes and assign to 8 communitjes 5 0 . 8 0 6 1 : v i X r a

  4. Data generatjng process: 8 7 Generate nodes and assign to 8 communitjes, T 5 0 . 8 0 g( T ) 6 1 : v i Generate links in G dependent X r on community membership a

  5. Community detectjon: 8 7 8 Infer T 5 0 . 8 0 f( G ) 6 1 : v i Observe G X r a Assess performance on how well we recover T

  6. Ground truth in real networks? 8 7 8 5 0 . 8 ? 0 6 1 : v i X r a

  7. Networks can have metadata that describe the nodes 8 7 8 5 social networks age, sex, ethnicity, race, etc. 0 . food webs feeding mode, species body mass, etc. 8 0 internet data capacity, physical locatjon, etc. 6 1 : protein interactjons molecular weight, associatjon with cancer, etc. v i X r a

  8. Recovering metadata implies sensible methods 8 7 8 5 0 . 8 0 6 1 : v i X stochastjc block model stochastjc block model r a with degree correctjon Karrer, Newman. Stochastjc blockmodels and community structure in networks. Phys. Rev. E 83, 016107 (2011). Adamic, Glance. The politjcal blogosphere and the 2004 US electjon: divided they blog. 36–43 (2005).

  9. Metadata ofuen treated as ground truth 8 7 8 5 0 . 8 0 6 1 : v i X r a Yang & Leskovec. Overlapping community detectjon at scale: a nonnegatjve matrix factorizatjon approach (2013).

  10. Metadata ofuen treated as ground truth 8 7 8 5 0 . 8 0 6 1 : v i Do you think thats ground X r truth you're detectjng? a Yang & Leskovec. Overlapping community detectjon at scale: a nonnegatjve matrix factorizatjon approach (2013).

  11. 8 7 8 Ground truth, T Ground truth, T 5 0 . 8 0 6 1 : d d ( ( T T , , f f ( ( G G ) ) ) ) v i X r a Communities, C = f ( G )

  12. Metadata, M d ( M, T ) 8 7 8 Ground truth, T Ground truth, T 5 0 d ( M, f ( G ) ) . 8 0 6 1 : d d ( ( T T , , f f ( ( G G ) ) ) ) v i X r a Communities, C Communities, C = f ( G ) = f ( G )

  13. When communitjes ≠ metadata... 8 7 8 5 0 . 8 0 6 1 : v i X r a (i) the metadata do not relate to the network structure,

  14. When communitjes ≠ metadata... 8 7 8 5 0 . 8 0 6 1 : v i X r a (ii) the detected communitjes and the metadata capture difgerent aspects of the network’s structure,

  15. When communitjes ≠ metadata... 8 7 8 5 0 . 8 0 6 1 : v i X r a (iii) the network contains no structure (e.g., an E-R random graph)

  16. When communitjes ≠ metadata... 8 7 8 5 0 . 8 0 6 1 : v i X r a (iv) the community detectjon algorithm does not perform well. Typically we assume this is the only possible cause

  17. The Karate Club network Instructor President 8 7 8 5 0 . 8 0 6 1 : v i X r a Split into factjons

  18. The Karate Club network Instructor President 8 7 8 5 0 . 8 0 6 1 : v i X r a Split into factjons

  19. ‘This can be explained by notjng that he was only three weeks away from a test for black belt (master status) when the split in the club 8 7 occurred. Had he joined the offjcers’[President's] 8 5 club he would have had to give up his rank and 0 . 8 begin again in a new style of karate with a white 0 6 (beginner’s) belt, since the offjcers had decided 1 : v i to change the style of karate practjced in their X r new club’ a - Zachary 1977

  20. You only see what you look for... 8 7 8 5 0 . 8 0 6 1 : v i X r a US politjcs is more than two opposing views Adamic, Glance. The politjcal blogosphere and the 2004 US electjon: divided they blog. 36–43 (2005). Peixoto, T. P. Hierarchical Block Structures and High-Resolutjon Model Selectjon in Large Networks. Phys. Rev. X 4, 011047 (2014).

  21. Difgerent generatjve processes = difgerent community structures 8 7 8 5 0 . 8 0 6 1 : v i X r a

  22. Many good partjtjons... 8 7 8 5 0 . 8 0 6 1 : v i X r a Evans, T. S. Clique graphs and overlapping communitjes. J. Stat. Mech. 2010, P12037–22 (2010).

  23. Metadata are not ground truth for community detectjon 8 7 8 5 0 . 8 0 6 1 : v i X r a

  24. Metadata are not ground truth for community detectjon No interpretability of negatjve results. (i) M unrelated to network structure (ii) C and M capture difgerent aspects of network structure 8 7 (iii) the network has no structure 8 (iv) the algorithm does not perform well 5 0 . 8 0 6 1 : v i X r a

  25. Metadata are not ground truth for community detectjon No interpretability of negatjve results. (i) M unrelated to network structure (ii) C and M capture difgerent aspects of network structure 8 7 (iii) the network has no structure 8 (iv) the algorithm does not perform well 5 0 . Multjple sets of metadata exist. 8 0 Which set is ground truth? 6 1 : v i X r a

  26. Metadata are not ground truth for community detectjon No interpretability of negatjve results. (i) M unrelated to network structure (ii) C and M capture difgerent aspects of network structure 8 7 (iii) the network has no structure 8 (iv) the algorithm does not perform well 5 0 . Multjple sets of metadata exist. 8 0 Which set is ground truth? 6 1 : We see what we look for. v i Confjrmatjon bias. Publicatjon bias. X r a

  27. Metadata are not ground truth for community detectjon No interpretability of negatjve results. (i) M unrelated to network structure (ii) C and M capture difgerent aspects of network structure 8 7 (iii) the network has no structure 8 (iv) the algorithm does not perform well 5 0 . Multjple sets of metadata exist. 8 0 Which set is ground truth? 6 1 : We see what we look for. v i Confjrmatjon bias. Publicatjon bias. X r a “Community” is model dependent. Do we expect all networks across all domains to have the same relatjonship with communitjes?

  28. Community detectjon is an inverse problem 8 7 Communitjes, T 8 5 0 . data community 8 g( T ) f( G ) generatjon detectjon 0 6 1 : v i Network, G X r a

  29. However, in real networks both T and g are unknown 8 7 8 5 For any graph there exist a (Bell) number of possible “ground truth” partjtjons, 0 and an infjnite number of capable generatjve models. . 8 0 6 1 {generatjve models, g} x {partjtjons, T} {graph G} : v i many to one X r a f o o r p r o f e r e h e e s The community detectjon problem is ill-posed (no unique solutjon)

  30. A No Free Lunch Theorem for community detectjon? NFL theorem (supervised learning) states that there cannot exist a classifjer that is a priori betuer than any other, averaged 8 over all possible problems. 7 8 5 0 . 8 0 6 1 : v i X r a Wolpert, D. H. The lack of a priori distjnctjons between learning algorithms. Neural Computatjon 8, 1341–1390 (1996).

  31. A No Free Lunch Theorem for community detectjon NFL Theorem for communtjy detectjon 8 (paraphrased): 7 8 5 For the community detectjon problem, with accuracy 0 . measured by adjusted mutual informatjon, the uniform 8 average of the accuracy of any method f over all 0 6 possible community detectjon problems is a constant 1 : which is independent of f . v i X r f o a o r p r o f e r e h e e s On average, no community detectjon algorithm performs betuer than any other

  32. a r X i v : 1 6 0 8 . 0 5 8 7 8

  33. So, what about metadata? 8 7 8 Metadata = types of nodes 5 0 . Communitjes = how nodes interact 8 0 6 1 : Metadata + Communitjes = how difgerent types of nodes interact with each other v i X r a we require new methods to understand the relatjonship between metadata and structure

  34. Are the metadata related to the network structure? Blockmodel Entropy Signifjcance Test 8 7 8 5 0 . 8 0 6 1 Do metadata and detected communitjes capture : v difgerent aspects network structure? i X r neoSBM a

  35. Are the metadata related to the network structure? Blockmodel Entropy Signifjcance Test 8 7 8 5 (i) the metadata do not relate to the network structure, 0 . 8 0 6 1 Do metadata and detected communitjes capture : v difgerent aspects network structure? i X r neoSBM a (ii) communitjes and metadata capture difgerent aspects network structure,

  36. The Stochastjc Blockmodel 8 Edges are conditjonally independent given community membership 7 p ij = p(e ij |z i ,z j ,ω) = ω zi,zj 8 5 0 . 8 inter-community 0 density 6 i n t 1 r a : - c o v m i m X u n r increasing i t y a d density e n s i inter-community t y density

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend