Hierarchically clustering time-directed graphs and the effects of - - PowerPoint PPT Presentation

hierarchically clustering time directed graphs and the
SMART_READER_LITE
LIVE PREVIEW

Hierarchically clustering time-directed graphs and the effects of - - PowerPoint PPT Presentation

Hierarchically clustering time-directed graphs and the effects of teleportation and memory Jevin West, Information School, University of Washington Network Clustering Graph Partitioning Community Detection Block Models Module Detection


slide-1
SLIDE 1

Hierarchically clustering time-directed graphs and the effects of teleportation and memory

Jevin West, Information School, University of Washington

slide-2
SLIDE 2
slide-3
SLIDE 3

Network Clustering Graph Partitioning Community Detection Block Models Module Detection

slide-4
SLIDE 4

http://www.iloveaba.com/2015/07/no-one-size-does-not-fit-all.html

slide-5
SLIDE 5

No one size fits all

  • No canonical solution or one generalizable

method for all data and all problems (i.e. there is no method that works best on all networks in all situations)

  • Need to know the context for why the user is

interested in clustering

  • We don’t even have a definition of a community
  • Umbrella term for many facets

Schaub, M.T. et al. (2017) The many facets of community detection in complex networks. Applied Network Science

slide-6
SLIDE 6

Cu Cut-bas based: d: community detection as minimization of some form of constraint violation Da Data clus ustering ng: community detection framed as a discretized analogue of data clustering, in which densely knit groups of nodes are to be found Sto Stochas asti tic equival valence: community detection aiming to identify structurally equivalent nodes in a network, leading to notions such as stochastic block models Dy Dyna namics perspective: community detection looking for simplified descriptions of the dynamical flows occurring on the network, that is, some form of dynamical model reduction

No one size fits all

Schaub, M.T. et al. (2017) The many facets of community detection in complex networks. Applied Network Science

slide-7
SLIDE 7

Hierarchical Herd Immunity

slide-8
SLIDE 8

Community Detection Perspectives

Schaub, M.T. et al. (2017) The many facets of community detection in complex networks. Applied Network Science

Circuit layout Minimizing cuts Load balancing Eigenvectors Spectral methods Image segmentation Data Clustering Maximizing node density unknown k, unbalanced Conductance Local, global Modularity Social Networks Connectivity Profiles Stochastic equivalence SBMs, LFR p-values, hypothesis testing Bipartite treatment Predict missing links System behavior, processes Non-adjacency focused Airline network Markovian diffusion process Undirected, Directed InfoMap

slide-9
SLIDE 9

Community Detection Perspectives

Schaub, M.T. et al. (2017) The many facets of community detection in complex networks. Applied Network Science

Circuit layout Minimizing cuts Load balancing Eigenvectors Spectral methods Image segmentation Data Clustering Maximizing node density unknown k, unbalanced Conductance Local, global Modularity Social Networks Connectivity Profiles Stochastic equivalence SBMs, LFR p-values, hypothesis testing Bipartite treatment Predict missing links System behavior, processes Non-adjacency focused Airline network Markovian diffusion process Undirected, Directed InfoMap

slide-10
SLIDE 10

Community Detection Perspectives

Schaub, M.T. et al. (2017) The many facets of community detection in complex networks. Applied Network Science

Circuit layout Minimizing cuts Load balancing Eigenvectors Spectral methods Image segmentation Data Clustering Maximizing node density unknown k, unbalanced Conductance Local, global Modularity Social Networks Connectivity Profiles Stochastic equivalence SBMs, LFR p-values, hypothesis testing Bipartite treatment Predict missing links System behavior, processes Non-adjacency focused Airline network Markovian diffusion process Undirected, Directed InfoMap

slide-11
SLIDE 11

Community Detection Perspectives

Schaub, M.T. et al. (2017) The many facets of community detection in complex networks. Applied Network Science

Circuit layout Minimizing cuts Load balancing Eigenvectors Spectral methods Image segmentation Data Clustering Maximizing node density unknown k, unbalanced Conductance Local, global Modularity Social Networks Connectivity Profiles Stochastic equivalence SBMs, LFR p-values, hypothesis testing Bipartite treatment Predict missing links System behavior, processes Non-adjacency focused Airline network Markovian diffusion process Undirected, Directed InfoMap

slide-12
SLIDE 12

Higher Resolution Maps

Rosvall et al. (2014) Memory in network flows and its effects on spreading dynamics and community

  • detection. Nature Communications
slide-13
SLIDE 13

In the spirit of clustering context…

slide-14
SLIDE 14

The Scholarly Graph

slide-15
SLIDE 15

Tens of millions articles, patents, books Billions of citation links Years: 1600 – 2016 1. Mapping Knowledge Domains 2. Science of Science 3. Hierarchical Navigation 4. Recommendation

slide-16
SLIDE 16

Mapping Knowledge Domains

Rosvall, Martin, and Carl T. Bergstrom. "Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems." PloS one 6.4 (2011): e18209.

1

slide-17
SLIDE 17

West, J.D. (2012) The Role of Gender in Scholarly Authorship. PLoS One

2

The Role of Gender in Science

slide-18
SLIDE 18

Hierarchical Navigation

3

slide-19
SLIDE 19

West, Wesley-Smith, Bergstrom (2016) A recommendation system based on hierarchical clustering of an article-level citation network. IEEE, Transactions on Big Data (in press)

4

Recommendation

Classic Expert

slide-20
SLIDE 20

Community Detection Perspectives

Schaub, M.T. et al. (2017) The many facets of community detection in complex networks. Applied Network Science

Circuit layout Minimizing cuts Load balancing Eigenvectors Spectral methods Image segmentation Data Clustering Maximizing node density unknown k, unbalanced Conductance Local, global Modularity Social Networks Connectivity Profiles Stochastic equivalence SBMs, LFR p-values, hypothesis testing Bipartite treatment Predict missing links System behavior, processes Non-adjacency focused Airline network Markovian diffusion process Undirected, Directed InfoMap

slide-21
SLIDE 21

Finding regularities in citation networks

Rosvall and Bergstrom (2008) PNAS

slide-22
SLIDE 22

Rosvall and Bergstrom (2010) PLoS One

The Emergence of Neuroscience

slide-23
SLIDE 23

Data

Compressing Finding patterns

If we can find a good code for describing flow on a network, we will have solved the dual problem of finding the important structures with respect to that flow.

slide-24
SLIDE 24

frequency of inter-module movements code length of module names frequency of movements within module i code length of node names in module i

The map equation

Rosvall and Bergstrom (2008) PNAS

slide-25
SLIDE 25

Mapequation.org, Daniel Edler

slide-26
SLIDE 26

The relationship between ranking and clustering

Clustering Ranking Dynamics Structure

slide-27
SLIDE 27

Step Length, Teleportation and Memory ..and their effects on ranking and clustering

slide-28
SLIDE 28

Memory: capturing higher order dynamics

Rosvall et al. (2014) Memory in network flows and its effects on spreading dynamics and community

  • detection. Nature Communications
slide-29
SLIDE 29

Memory: capturing higher order dynamics

Rosvall et al. (2014) Memory in network flows and its effects on spreading dynamics and community

  • detection. Nature Communications
slide-30
SLIDE 30

Higher Resolution Maps

Rosvall et al. (2014) Memory in network flows and its effects on spreading dynamics and community

  • detection. Nature Communications
slide-31
SLIDE 31

Higher Order Dynamics

Rosvall et al. (2014) Memory in network flows and its effects on spreading dynamics and community

  • detection. Nature Communications
slide-32
SLIDE 32

Article-level Networks Journal-Level Networks (Memory)

Citation Networks Types

Time-Directed (Acyclic) Graphs

slide-33
SLIDE 33

PageRank Variants (EigenFactor)

P = α H + (1 − α) a.eT

Matrix representing the random walk over citations Probability of not teleporting Cross-citation Matrix dictating the structure

  • f the citation network

Probability of teleporting to completely new journal weighted by the number

  • f articles in that journal

EF =100 Hπ [Hπ]i

i

Leading eigenvector

  • f the random walk

matrix P. Normalization West, JD et al. (2010) College of Research Libraries

slide-34
SLIDE 34

PageRank Pitfalls

Maslov, S. & Redner, S. (2008) Promise and Pitfalls of Extending Google’s PageRank Algorithm to Citation

  • Networks. The Journal of Neuroscience
slide-35
SLIDE 35

teleport DIR-R (PageRank) N O N O D D E E S S ( 1

) r e c

  • r

d DIR-UR (EigenFactor) don’t record record

  • ther
  • ther
  • ther

in-degree in-out

  • ut-degree

UNDIR:DIR OUTDIR-DIR (Count Links) L I L I N N K K S S d

  • n

’ t r e c

  • r

d

  • u

t

  • d

e g r e e in-out in-degree INDIR:DIR

Teleportation Strategies

slide-36
SLIDE 36

Smart Teleportation

Lambiotte, R. & Rosvall, M. (2012) Ranking and clustering of nodes in networks with smart teleportation

slide-37
SLIDE 37

Smart Teleportation and Clustering

Lambiotte, R. & Rosvall, M. (2012) Ranking and clustering of nodes in networks with smart teleportation

slide-38
SLIDE 38

Article-level Ranking and Mapping

West et al. (2016) Ranking and mapping article-level citation networks. in prep. UNDIR:DIR DIR-R (PageRank) Smooths ranking ~ better clustering

slide-39
SLIDE 39

teleport DIR-R (PageRank) N O N O D D E E S S ( 1 – α ) r e c

  • r

d DIR-UR (EigenFactor) don’t record record

  • ther
  • ther
  • ther

in-degree total

  • ut-degree

UNDIR:DIR OUTDIR-DIR (Count Links) L I L I N N K K S S d

  • n

’ t r e c

  • r

d

  • u

t

  • d

e g r e e total in-degree INDIR:DIR

Teleportation Strategies

slide-40
SLIDE 40

Article-level Eigenfactor

slide-41
SLIDE 41
slide-42
SLIDE 42
slide-43
SLIDE 43

Running Experiments

slide-44
SLIDE 44

Clustering on time-directed networks

  • Empirical exploration of hierarchical partitions

with varying dynamics

  • The effects of changing recorded teleportation

ranking and clustering

Ranking Effects Clustering Effects

slide-45
SLIDE 45

Article-level Ranking and Mapping

West et al. (2016) Ranking and mapping article-level citation networks. in prep. UNDIR:DIR DIR-R (PageRank) Smooths ranking ~ better clustering

slide-46
SLIDE 46

Revealing Hierarchical Structure

slide-47
SLIDE 47

West, Wesley-Smith, Bergstrom (2016) A recommendation system based on hierarchical clustering of an article-level citation network. IEEE, Transactions on Big Data (in press)

JW JW Recommend

Classic Expert

slide-48
SLIDE 48

Community Detection Perspectives

Schaub, M.T. et al. (2017) The many facets of community detection in complex networks. Applied Network Science

Circuit layout Minimizing cuts Load balancing Eigenvectors Spectral methods Image segmentation Data Clustering Maximizing node density unknown k, unbalanced Conductance Local, global Modularity Social Networks Connectivity Profiles Stochastic equivalence SBMs, LFR p-values, hypothesis testing Bipartite treatment Predict missing links System behavior, processes Non-adjacency focused Airline network Markovian diffusion process Undirected, Directed InfoMap

slide-49
SLIDE 49

Summary

  • Community detection – one size does not fit all
  • Citation networks - dynamical perspective
  • Memory - higher order dynamics
  • Unrecordeded teleportation to links (undirdir)

improves ranking and hierarchical clustering

  • Next steps – building benchmarks and methods

for evaluating the different rankings and hierarchical clusterings (refer to Jennifer Webster’s talk tomorrow)

slide-50
SLIDE 50
slide-51
SLIDE 51

Carl Bergstrom, Department of Biology, University of Washington

Acknowledgements

Jason Portenoy, Information School, University of Washington Bill Howe, eScience, CSE, University of Washington Seung-Hee Bae, Computer Science, Western Michigan Martin Rosvall, Department of Physics, Umea University Jennifer Webster, Pacific Northwest National Laboratory

slide-52
SLIDE 52

Jevin West jevinw@uw.edu jevinwest.org @jevinwest