http://cs224w.stanford.edu Subnetworks , or subgraphs, are the - - PowerPoint PPT Presentation
http://cs224w.stanford.edu Subnetworks , or subgraphs, are the - - PowerPoint PPT Presentation
CS224W: Machine Learning with Graphs Jure Leskovec, Stanford University http://cs224w.stanford.edu Subnetworks , or subgraphs, are the building blocks of networks: They have the power to characterize and discriminate networks 10/9/18 Jure
¡ Subnetworks, or subgraphs, are the building
blocks of networks:
¡ They have the power to characterize and
discriminate networks
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 2
10/9/18
Subgraph decomposition of an electronic circuit
Oxford Protein Informatics Group
Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 3
Let’s consider all possible (non-isomorphoic) directed subgraphs of size 3
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 4
¡ For each subgraph:
§ Imagine you have a metric capable of classifying the subgraph “significance” [more on that later]
§ Negative values indicate under-representation § Positive values indicate over-representation
¡ We create a network significance profile:
§ A feature vector with values for all subgraph types
¡ Next: Compare profiles of different networks:
§ Regulatory network (gene regulation) § Neuronal network (synaptic connections) § World Wide Web (hyperlinks between pages) § Social network (friendships) § Language networks (word adjacency)
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 5
10/9/18
Networks from the same domain have similar significance profiles
Milo et al., Science 2004
Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 6
Web and social Neurons Gene regulation networks Language networks
Network significance profile
Network significance profile similarity
10/9/18
Closely related networks have more similar significance profiles
Milo et al., Science 2004
Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 7
Networks based on their significance profiles
Correlation in significance profile
- f the English and
French language networks
Gene regulatory networks Neurons Social WWW Language
1) Subgraphs: § Defining Motifs and graphlets
§
Finding Motifs and Graphlets 2) Structural roles in networks
§ RolX: Structural Role Discovery Method
3) Discovering structural roles and its
applications:
§ Structural similarity § Role generalization and transfer learning § Making sense of roles
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 8
¡ Network motifs: “recurring, significant
patterns of interconnections”
¡ How to define a network motif:
§ Pattern: Small induced subgraph § Recurring: Found many times, i.e., with high frequency § Significant: More frequent than expected, i.e., in randomly generated networks
§ Erdos-Renyi random graphs, scale-free networks
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 10
¡ Motifs:
§ Help us understand how networks work § Help us predict operation and reaction of the network in a given situation
¡ Examples:
§ Feed-forward loops: found in networks of neurons, where they neutralize “biological noise” § Parallel loops: found in food webs § Single-input modules: found in gene control networks
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 11
Feed-forward loop Single-input module Parallel loop
)/$0123
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 12
No match!
(not induced)
Match!
(induced)
Induced subgraph
- f interest
(aka Motif):
Induced subgraph of graph G is a graph, formed from a subset X of the vertices of graph G and all of the edges connecting pairs of vertices in subset X.
10/9/18
Example borrowed from Pedro Ribeiro
Motif of interest:
¡ Allow overlapping of motifs ¡ Network on the right has
4 occurrences of the motif:
§ {1,2,3,4,5} § {1,2,3,4,6} § {1,2,3,4,7} § {1,2,3,4,8}
Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 13
¡ Key idea: Subgraphs that occur in a real
network much more often than in a random network have functional significance
10/9/18
Milo et. al., Science 2002
Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 14
10/9/18
¡ Motifs are overrepresented in a network when
compared to randomized networks:
§ 𝑎" captures statistical significance of motif 𝒋: 𝑎" = (𝑂"
'()*− ,
𝑂"
')-.)/std(𝑂" ')-.) § 𝑂"
'()* is #(subgraphs of type 𝑗) in network 𝐻'()*
§ 𝑂"
')-. is #(subgraphs of type 𝑗) in randomized network 𝐻')-.
¡ Network significance profile (SP):
𝑇𝑄" = 𝑎"/ 8
9
𝑎
9 :
§ 𝑇𝑄 is a vector of normalized Z-scores § 𝑇𝑄 emphasizes relative significance of subgraphs: § Important for comparison of networks of different sizes § Generally, larger networks display higher Z-scores
15 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu
¡ Goal: Generate a random graph with a
given degree sequence k1, k2, … kN
¡ Useful as a “null” model of networks:
§ We can compare the real network 𝐻'()* and a “random” 𝐻')-. which has the same degree sequence as 𝐻'()*
¡ Configuration model:
Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu
B
10/9/18 16
Nodes with spokes Randomly pair up “mini”-n0des
A C D
A B C D
B A C D
Resulting graph
We ignore double edges and self-loops when creating the final graph
¡ Start from a given graph 𝑯 ¡ Repeat the switching step 𝑅 ⋅ |𝐹| times:
§ Select a pair of edges AàB, CàD at random § Exchange the endpoints to give AàD, CàB
§ Exchange edges only if no multiple edges
- r self-edges are generated
¡ Result: A randomly rewired graph:
§ Same node degrees, randomly rewired edges
¡ 𝑅 is chosen large enough (e.g., 𝑅 = 100) for the
process to converge
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 17
B A C D B A C D
10/9/18
Milo et al., Science 2004
Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 18
Network significance profile
𝑎" = (𝑂"
'()*− ,
𝑂"
')-.)/std(𝑂" ')-.)
¡ Count subgraphs 𝑗 in 𝐻'()* ¡ Count subgraphs 𝑗 in random networks 𝐻')-.:
§ Configuration model: Each 𝐻')-. has the same #(nodes), #(edges) and #(degree distribution) as 𝐻'()*
¡ Assign Z-score to 𝑗:
§ 𝑎" = (𝑂"
'()*− ,
𝑂"
')-.)/std(𝑂" ')-.)
§ High Z-score: Subgraph 𝑗 is a network motif of 𝑯
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 19
¡ Canonical definition:
§ Directed and undirected § Colored and uncolored § Temporal and static motifs
¡ Variations on the concept
§ Different frequency concepts § Different significance metrics § Under-Representation (anti-motifs) § Different constraints for null model
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 20
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 21
Z-scores of individual motifs for different networks
Milo et al., Science 2004
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 22
Z-scores of individual motifs for different networks
Milo et al., Science 2004
¡ Network of neurons and a gene network
contain similar motifs:
§ Feed-forward loops and bi-fan structures § Both are information processing networks with sensory and acting components
¡ Food webs have parallel loops:
§ Prey of a particular predator share prey
¡ WWW network has bidirectional links
§ Design that allows the shortest path between sets
- f related pages
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 23
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 24
¡ Graphlets: connected non-isomorphic subgraphs
§ Induced subgraphs of any frequency
10/9/18
Przulj et al., Bioinformatics 2004
For 𝒐 = 𝟒, 𝟓, 𝟔, … 𝟐𝟏 there are 𝟑, 𝟕, 𝟑𝟐, … 𝟐𝟐𝟖𝟐𝟕𝟔𝟖𝟐 graphlets!
Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 25
¡ Next: Use graphlets to obtain a node-level
subgraph metric
¡ Degree counts #(edges) that a node touches:
§ Can we generalize this notion for graphlets? – Yes!
¡ Graphlet degree vector counts #(graphlets)
that a node touches
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 26
¡ An automorphism orbit takes into account the
symmetries of a subgraph
¡ Graphlet Degree Vector (GDV): a vector with
the frequency of the node in each orbit position
¡ Example: Graphlet degree vector of node v
10/9/18
For a node 𝑣 of graph 𝐻, the automorphism orbit of 𝑣 is 𝑃𝑠𝑐 𝑣 = {𝑤 ∈ 𝑊 𝐻 ; 𝑤 = 𝑔 𝑣 for some 𝑔 = Aut(𝐻)}. The Aut denotes an automorphism group
- f 𝐻, i.e., an isomorphism from 𝐻 to itself.
Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 27
¡ Graphlet degree vector counts #(graphlets) that a
node touches at a particular orbit
¡ Considering graphlets on 2 to 5 nodes we get:
§ Vector of 73 coordinates is a signature of a node that describes the topology of node's neighborhood § Captures its interconnectivities out to a distance of 4 hops
¡ Graphlet degree vector provides a measure of a
node’s local network topology:
§ Comparing vectors of two nodes provides a highly constraining measure of local topological similarity between them
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 28
Graphlet Degree Vector (GDV) of node A:
§ 𝑗-th element of GDV(A): #(graphlets) that touch A at
- rbit 𝑗
§ Highlighted are graphlets that touch node A at orbits 15, 19, 27, and 35 from left to right
10/9/18
Yaveroglu et al., Journal of Statistical Software 2015
Orbit 15
F C A B E D
Orbit 19
F C A B E D
Orbit 27 Orbit 35 Orbit 0 1 2...3 4 5 6 7...14 15 16...18 19 20...26 27 28...34 35 36...72 GDV(A) 1 2 0...0 3 0 1 0...0 1 0...0 1 0...0 1 0...0 1 0...0
F C A B E D F C A B E D Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 29
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 30
¡ Finding size-k motifs/graphlets requires solving
two challenges:
§ 1) Enumerating all size-k connected subgraphs § 2) Counting #(occurrences of each subgraph type)
¡ Just knowing if a certain subgraph exists in a
graph is a hard computational problem!
§ Subgraph isomorphism is NP-complete
¡ Computation time grows exponentially as the
size of the motif/graphlet increases
§ Feasible motif size is usually small (3 to 8)
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 31
¡ Network-centric approaches:
§ 1) Enumerating all size-k connected subgraphs § 2) Counting #(occurrences of each subgraph type) via graph isomorphisms test
¡ Algorithms:
§ Exact subgraph enumeration (ESU) [Wernicke 2006] § Kavosh [Kashani et al. 2009] § Subgraph sampling [Kashtan et al. 2004]
¡ Today: ESU algorithm
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 32
¡ Two sets:
§ 𝑊
`abc')de : currently constructed subgraph (motif)
§ 𝑊
(fg(-`hi-: set of candidate nodes to extend the motif
¡ Idea: Starting with a node 𝑤, add those nodes 𝑣
to 𝑊
(fg(-`hi- set that have two properties:
§ 𝑣’s node_id must be larger than that of 𝑤 § 𝑣 may only be neighbored to some newly added node 𝑥 but not of any node already in 𝑊
`abc')de
¡ ESU is implemented as a recursive function:
§ The running of this function can be displayed as a tree-like structure of depth 𝒍, called the ESU-Tree
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 33
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 34
Wernicke, IEEE/ACM Transactions on Computational Biology and Bioinformatics 2006
𝑂lmno 𝑥, 𝑊
pqrstuvw = 𝑂 𝑥 \(𝑊 pqrstuvw ∪ 𝑂(𝑊 pqrstuvw)) is exclusive
neighborhood: All nodes neighboring 𝑥 but not of 𝑊
pqrstuvw or 𝑂(𝑊 pqrstuvw)
¡ Nodes in the ESU-tree include two adjoining sets:
§ 𝑾𝒕𝒗𝒄𝒉𝒔𝒃𝒒𝒊: Current subgraph (a set of adjacent nodes) § 𝑾𝒇𝒚𝒖𝒇𝒐𝒕𝒋𝒑𝒐 : Nodes adjacent to𝑾𝒕𝒗𝒄𝒉𝒔𝒃𝒒𝒊 whose node_ids are larger than starting node 𝑤
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 35
Leaves represent all size-3 induced sub-graphs
4 5 3 1 2
Root ({1}, {3}) ({2}, {3}) ({3}, {4,5}) ({4}, {5}) ({5}, {∅}) ({1,3}, {2,4,5}) ({2,3}, {,4,5}) ({3,4}, {5}) ({4,5}, {∅}) 3 1 2 5 3 1 4 3 2 5 3 2 4 3 1 4 5 3
¡ So far, we enumerated all size-k subgraphs in
the input graph
¡ Next step: Count the graphs
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 36
Count: 5 1
3 1 2 5 3 1 4 3 2 5 3 2 4 3 1 4 5 3
¡ So far, we enumerated all size-k subgraphs in
the input graph
¡ Next step: Count the graphs
Classify subgraphs placed in the ESU-Tree leaves into non-isomorphic size-k classes:
§ Determine which subgraphs in ESU-Tree leaves are topologically equivalent (isomorphic) and group them into subgraph classes accordingly § Use McKay’s nauty algorithm [McKay 1981]
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 37
¡ Graphs 𝐻 and 𝐼 are isomorphic if there exists a
bijection 𝑔: 𝑊 𝐻 → 𝑊(𝐼) such that:
§ Any two nodes 𝒗 and 𝒘 of 𝑯 are adjacent in 𝑯 iff 𝒈(𝒗) and 𝒈(𝒘) are adjacent in 𝑰
¡ Example: Are 𝐻 and 𝐼 topologically equivalent?
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 38
𝐻 𝐼
A 4 B 7 C 1 D 3 E 5 F 8 G 2 H 9 I 6
𝑔:
Need to check 9! possible bijections between node sets Hard computational problem! 𝑯 and 𝑰 are isomorphic!
1) Subgraphs: § Defining Motifs and graphlets
§
Finding Motifs and Graphlets 2) Structural roles in networks
§ RolX: Structural Role Discovery Method
3) Discovering structural roles and its
applications:
§ Structural similarity § Role generalization and transfer learning § Making sense of roles
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 39
¡ Roles are “functions” of nodes in a network:
§ Roles of species in ecosystems § Roles of individuals in companies
¡ Roles are measured by structural behaviors:
§ centers of stars § members of cliques § peripheral nodes, etc.
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 41
10/9/18
centers of stars members of cliques peripheral nodes
Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 42
Network Science Co-authorship network [Newman 2006]
¡ Role: A collection of nodes which have similar
positions in a network:
¡ Roles are based on the similarity of ties between subsets of
nodes
§ Different from groups/communities
§ Group is formed based on adjacency, proximity or reachability § This is typically adopted in current data mining
Nodes with the same role need not be in direct,
- r even indirect interaction with each other
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 43
¡ Roles:
§ A group of nodes with similar structural properties
¡ Communities/Groups:
§ A group of nodes that are well-connected to each other
¡ Roles and communities are complementary ¡ Consider the social network of a CS Dept:
§ Roles: Faculty, Staff, Students § Communities: AI Lab, Info Lab, Theory Lab
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 44
10/9/18
RolX Fast Modularity Henderson, et al., KDD 2012
Roles Communities
Clauset, et al., Phys. Rev. E 2004
Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 45
¡ Structural equivalence: Nodes 𝑣 and 𝑤 are
structurally equivalent if they have the same relationships to all other nodes [Lorrain & White 1971]
§ Structurally equivalent nodes are likely to be similar in
- ther ways – i.e., friendships in social networks
10/9/18
u v d e a b c
Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 46
¡ Nodes 𝑣 and 𝑤 are structurally equivalent:
§ For all the other nodes 𝑙, node 𝑣 has tie to 𝑙 iff node 𝑤 has tie to 𝑙
¡ Example: ¡ E.g., nodes 3 and 4 are structurally equivalent
1 2 3 4 5 Adjacency matrix
1 2 3 4 5 1
- 1
1 2
- 1
1 3
- 1
4
- 1
5
- 10/9/18
Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 48
1) Subgraphs: § Defining Motifs and graphlets
§
Finding Motifs and Graphlets 2) Structural roles in networks
§ RolX: Structural Role Discovery Method
3) Discovering structural roles and its
applications:
§ Structural similarity § Role generalization and transfer learning § Making sense of roles
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 48
Task Example Application Role query Identify individuals with similar behavior to a known target Role outliers Identify individuals with unusual behavior Role dynamics Identify unusual changes in behavior Identity resolution Identify, de-anonymize, individuals in a new network Role transfer Use knowledge of one network to make predictions in another another Network comparison Compute similarity of networks, determine compatibility for knowledge transfer
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 50
¡ RolX: Automatic discovery
- f nodes’ structural roles in
networks [Henderson, et al. 2011b]
§ Unsupervised learning approach § No prior knowledge required § Assigns a mixed-membership of roles to each node § Scales linearly in #(edges)
10/9/18
Role Discovery
ü
Automated discovery
ü
Behavioral roles
ü
Roles generalize
Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 51
Input Output
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 52
Node × Node Adjacency Matrix Recursive Feature Extraction Node × Feature Matrix Role Extraction Node × Role Matrix Role × Feature Matrix
Example: degree, mean weight, # of edges in egonet, mean clustering coefficient of neighbors, etc.
Input Output
¡ Recursive feature extraction [Henderson, et al. 2011a] turns
network connectivity into structural features
¡ Neighborhood features: What is a node’s connectivity pattern? ¡ Recursive features: To what kinds of nodes is a node connected?
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 53
ReFeX
Local Egonet Recursive Neighborhood Regional
1411# 0# 1# 2# 1# 0# 0# 0# 1# 1# 0# 1# 0# 0# 1# 1# 2# 2# 1410# 0# 1# 1# 1# 0# 1# 0# 0# 1# 0# 1# 0# 1# 0# 1# 1# 1# 338# 0# 0# 0# 0# 1# 0# 1# 0# 0# 1# 0# 0# 0# 1# 0# 0# 0# 339# 1# 0# 0# 0# 2# 0# 1# 0# 0# 2# 0# 1# 0# 1# 0# 0# 0# 1415# 0# 1# 1# 2# 0# 1# 0# 0# 0# 0# 0# 0# 1# 1# 1# 1# 1# 941# 0# 0# 0# 0# 1# 0# 1# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 1414# 0# 1# 1# 1# 0# 1# 0# 0# 0# 0# 0# 0# 1# 1# 0# 1# 1# 942# 0# 0# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 1413# 0# 1# 1# 1# 0# 1# 1# 0# 0# 0# 0# 0# 1# 1# 0# 1# 1# 1412# 0# 0# 0# 0# 0# 0# 0# 1# 2# 0# 1# 1# 0# 0# 1# 2# 0# 940# 0# 0# 1# 0# 0# 0# 0# 1# 0# 0# 0# 1# 1# 0# 1# 1# 1# 1419# 0# 0# 1# 0# 0# 1# 0# 1# 1# 0# 1# 1# 1# 0# 1# 1# 1# 945# 0# 1# 4# 3# 0# 0# 0# 0# 2# 0# 1# 0# 0# 2# 1# 3# 1# 332# 0# 0# 0# 0# 1# 0# 1# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 1418# 0# 0# 1# 0# 0# 0# 0# 1# 0# 0# 0# 1# 2# 0# 1# 0# 1# 946# 0# 1# 1# 0# 0# 1# 0# 1# 0# 0# 0# 1# 4# 0# 1# 1# 2# 333# 0# 0# 0# 0# 1# 0# 1# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 1417# 0# 1# 1# 1# 0# 2# 0# 0# 1# 0# 1# 0# 1# 0# 1# 1# 1# 943# 0# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 1# 0# 0# 330# 1# 3# 2# 0# 1# 2# 2# 0# 2# 2# 2# 0# 3# 1# 0# 2# 5# 1416# 0# 1# 1# 1# 1# 2# 0# 0# 1# 0# 1# 0# 1# 0# 0# 1# 1# 944# 0# 1# 4# 2# 0# 0# 0# 0# 2# 0# 1# 0# 0# 2# 0# 3# 1# 331# 0# 3# 2# 1# 0# 1# 0# 0# 2# 0# 2# 0# 2# 0# 1# 2# 5# 949# 0# 0# 0# 0# 2# 0# 0# 1# 0# 1# 0# 1# 0# 0# 0# 0# 0# 336# 0# 0# 0# 0# 2# 0# 0# 1# 1# 1# 1# 1# 0# 0# 0# 1# 0# 337# 1# 1# 1# 0# 0# 1# 2# 0# 1# 1# 1# 0# 1# 1# 1# 1# 1# 947# 1# 0# 0# 0# 2# 0# 1# 0# 0# 2# 0# 1# 0# 1# 0# 0# 0# 334# 0# 0# 0# 1# 1# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 948# 0# 0# 0# 0# 0# 1# 0# 1# 1# 0# 1# 1# 1# 0# 1# 1# 0# 335# 0# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 1# 0# 0# 531# 1# 0# 0# 0# 1# 0# 2# 0# 0# 2# 0# 0# 0# 2# 0# 0# 0#
Nodes
Recursive feature extraction
Egonet for red node
¡ Idea: Aggregate features of a node and use them
to generate new recursive features
¡ Base set of a node’s neighborhood features:
§ Local features: All measures of the node degree:
§ If network is directed, include in- and out-degree, total degree § If network is weighted, include weighted feature versions
§ Egonet features: Computed on the node’s egonet:
§ Egonet includes the node, its neighbors, and any edges in the induced subgraph on these nodes § #(within-egonet edges), #(edges entering/leaving egonet)
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 54
Egonet for red node
¡ Start with the base set of node features ¡ Use the set of current node features to generate
additional features:
§ Two types of aggregate functions: mean and sum
§ E.g., mean value of “unweighted degree” feature between all neighbors of a node § Compute means and sums over all current features, including other recursive features
§ Repeat
¡ The number of possible recursive features
grows exponentially with each recursive iteration:
§ Reduce the number of features using a pruning technique:
§ Look for pairs of features that are highly correlated § Eliminate one of the features whenever two features are correlated above a user-defined threshold
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 55
Features
1411# 0# 1# 2# 1# 0# 0# 0# 1# 1# 0# 1# 0# 0# 1# 1# 2# 2# 1410# 0# 1# 1# 1# 0# 1# 0# 0# 1# 0# 1# 0# 1# 0# 1# 1# 1# 338# 0# 0# 0# 0# 1# 0# 1# 0# 0# 1# 0# 0# 0# 1# 0# 0# 0# 339# 1# 0# 0# 0# 2# 0# 1# 0# 0# 2# 0# 1# 0# 1# 0# 0# 0# 1415# 0# 1# 1# 2# 0# 1# 0# 0# 0# 0# 0# 0# 1# 1# 1# 1# 1# 941# 0# 0# 0# 0# 1# 0# 1# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 1414# 0# 1# 1# 1# 0# 1# 0# 0# 0# 0# 0# 0# 1# 1# 0# 1# 1# 942# 0# 0# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 1413# 0# 1# 1# 1# 0# 1# 1# 0# 0# 0# 0# 0# 1# 1# 0# 1# 1# 1412# 0# 0# 0# 0# 0# 0# 0# 1# 2# 0# 1# 1# 0# 0# 1# 2# 0# 940# 0# 0# 1# 0# 0# 0# 0# 1# 0# 0# 0# 1# 1# 0# 1# 1# 1# 1419# 0# 0# 1# 0# 0# 1# 0# 1# 1# 0# 1# 1# 1# 0# 1# 1# 1# 945# 0# 1# 4# 3# 0# 0# 0# 0# 2# 0# 1# 0# 0# 2# 1# 3# 1# 332# 0# 0# 0# 0# 1# 0# 1# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 1418# 0# 0# 1# 0# 0# 0# 0# 1# 0# 0# 0# 1# 2# 0# 1# 0# 1# 946# 0# 1# 1# 0# 0# 1# 0# 1# 0# 0# 0# 1# 4# 0# 1# 1# 2# 333# 0# 0# 0# 0# 1# 0# 1# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 1417# 0# 1# 1# 1# 0# 2# 0# 0# 1# 0# 1# 0# 1# 0# 1# 1# 1# 943# 0# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 1# 0# 0# 330# 1# 3# 2# 0# 1# 2# 2# 0# 2# 2# 2# 0# 3# 1# 0# 2# 5# 1416# 0# 1# 1# 1# 1# 2# 0# 0# 1# 0# 1# 0# 1# 0# 0# 1# 1# 944# 0# 1# 4# 2# 0# 0# 0# 0# 2# 0# 1# 0# 0# 2# 0# 3# 1# 331# 0# 3# 2# 1# 0# 1# 0# 0# 2# 0# 2# 0# 2# 0# 1# 2# 5# 949# 0# 0# 0# 0# 2# 0# 0# 1# 0# 1# 0# 1# 0# 0# 0# 0# 0# 336# 0# 0# 0# 0# 2# 0# 0# 1# 1# 1# 1# 1# 0# 0# 0# 1# 0# 337# 1# 1# 1# 0# 0# 1# 2# 0# 1# 1# 1# 0# 1# 1# 1# 1# 1# 947# 1# 0# 0# 0# 2# 0# 1# 0# 0# 2# 0# 1# 0# 1# 0# 0# 0# 334# 0# 0# 0# 1# 1# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 948# 0# 0# 0# 0# 0# 1# 0# 1# 1# 0# 1# 1# 1# 0# 1# 1# 0# 335# 0# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 1# 0# 0# 531# 1# 0# 0# 0# 1# 0# 2# 0# 0# 2# 0# 0# 0# 2# 0# 0# 0#Nodes
Output
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 56
Features
1411# 0# 1# 2# 1# 0# 0# 0# 1# 1# 0# 1# 0# 0# 1# 1# 2# 2# 1410# 0# 1# 1# 1# 0# 1# 0# 0# 1# 0# 1# 0# 1# 0# 1# 1# 1# 338# 0# 0# 0# 0# 1# 0# 1# 0# 0# 1# 0# 0# 0# 1# 0# 0# 0# 339# 1# 0# 0# 0# 2# 0# 1# 0# 0# 2# 0# 1# 0# 1# 0# 0# 0# 1415# 0# 1# 1# 2# 0# 1# 0# 0# 0# 0# 0# 0# 1# 1# 1# 1# 1# 941# 0# 0# 0# 0# 1# 0# 1# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 1414# 0# 1# 1# 1# 0# 1# 0# 0# 0# 0# 0# 0# 1# 1# 0# 1# 1# 942# 0# 0# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 1413# 0# 1# 1# 1# 0# 1# 1# 0# 0# 0# 0# 0# 1# 1# 0# 1# 1# 1412# 0# 0# 0# 0# 0# 0# 0# 1# 2# 0# 1# 1# 0# 0# 1# 2# 0# 940# 0# 0# 1# 0# 0# 0# 0# 1# 0# 0# 0# 1# 1# 0# 1# 1# 1# 1419# 0# 0# 1# 0# 0# 1# 0# 1# 1# 0# 1# 1# 1# 0# 1# 1# 1# 945# 0# 1# 4# 3# 0# 0# 0# 0# 2# 0# 1# 0# 0# 2# 1# 3# 1# 332# 0# 0# 0# 0# 1# 0# 1# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 1418# 0# 0# 1# 0# 0# 0# 0# 1# 0# 0# 0# 1# 2# 0# 1# 0# 1# 946# 0# 1# 1# 0# 0# 1# 0# 1# 0# 0# 0# 1# 4# 0# 1# 1# 2# 333# 0# 0# 0# 0# 1# 0# 1# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 1417# 0# 1# 1# 1# 0# 2# 0# 0# 1# 0# 1# 0# 1# 0# 1# 1# 1# 943# 0# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 1# 0# 0# 330# 1# 3# 2# 0# 1# 2# 2# 0# 2# 2# 2# 0# 3# 1# 0# 2# 5# 1416# 0# 1# 1# 1# 1# 2# 0# 0# 1# 0# 1# 0# 1# 0# 0# 1# 1# 944# 0# 1# 4# 2# 0# 0# 0# 0# 2# 0# 1# 0# 0# 2# 0# 3# 1# 331# 0# 3# 2# 1# 0# 1# 0# 0# 2# 0# 2# 0# 2# 0# 1# 2# 5# 949# 0# 0# 0# 0# 2# 0# 0# 1# 0# 1# 0# 1# 0# 0# 0# 0# 0# 336# 0# 0# 0# 0# 2# 0# 0# 1# 1# 1# 1# 1# 0# 0# 0# 1# 0# 337# 1# 1# 1# 0# 0# 1# 2# 0# 1# 1# 1# 0# 1# 1# 1# 1# 1# 947# 1# 0# 0# 0# 2# 0# 1# 0# 0# 2# 0# 1# 0# 1# 0# 0# 0# 334# 0# 0# 0# 1# 1# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 948# 0# 0# 0# 0# 0# 1# 0# 1# 1# 0# 1# 1# 1# 0# 1# 1# 0# 335# 0# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 1# 0# 0# 531# 1# 0# 0# 0# 1# 0# 2# 0# 0# 2# 0# 0# 0# 2# 0# 0# 0#Nodes
Recursively extract features Cluster nodes based on extracted features
Input Output
RolX uses non negative matrix factorization for clustering, MDL for model selection, and KL divergence to measure likelihood
¡ Task: Cluster nodes based on their structural
similarity
¡ Two networks:
§ Network science co-authorship network:
§ Nodes: Network scientists; § Edges: The number of co-authored papers
§ Political books co-purchasing network:
§ Nodes: Political books on Amazon; § Edges: Frequent co-purchasing of books by the same buyers
¡ Setup: For each network:
§ Use RolX to assign each node a distribution over the set of discovered, structural roles § Determine similarity between nodes by comparing their role distributions
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 57
Making sense of roles:
¡
Blue circle: Tightly knit, nodes that participate in tightly-coupled groups
¡
Red diamond: Bridge nodes, that connect groups of nodes
¡
Gray rectangle: Main-stream, majority of nodes, neither a clique, nor a chain
¡
Green triangle: Pathy, nodes that belong to elongated clusters
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 58
Role-colored graph: each node is colored by the primary role that RolX finds Role affinity heat-map
10/9/18
Bright blue nodes are peripheral nodes Bright red nodes are locally central nodes
conservative liberal neutral
Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 59
Book labels (i.e., liberal, conservative, neutral) were not given to role discovery algorithm
1) Subgraphs: § Defining Motifs and graphlets
§
Finding Motifs and Graphlets 2) Structural roles in networks
§ RolX: Structural Role Discovery Method
3) Discovering structural roles and its
applications:
§ Structural similarity § Role generalization and transfer learning § Making sense of roles
10/9/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 62