http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node - - PowerPoint PPT Presentation
http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node - - PowerPoint PPT Presentation
CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2
? ? ? ? ?
Machine Learning
Node classification
Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2 12/4/17
3
Raw Data Structured Data Learning Algorithm Model Downstream prediction task Feature Engineering
Automatically learn the features
Β‘ (Supervised) Machine Learning
Lifecycle: This feature, that feature. Every single time!
Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12/4/17
Goal: Efficient task-independent feature learning for machine learning in networks!
Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 4
vec node 2 π: π£ β β& β&
Feature representation, embedding
u
12/4/17
Β‘ We map each node in a network into a low-
dimensional space
Β§ Distributed representation for nodes Β§ Similarity between nodes indicates link strength Β§ Encode network information and generate node representation
12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 5
- β
β β
17
Β‘ Zacharyβs Karate Club network:
12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 6
- Zacharyβs Karate Network:
Graph representation learning is hard:
Β‘ Images are fixed size
Β§ Convolutions (CNNs)
Β‘ Text is linear
Β§ Sliding window (word2vec)
Β‘ Graphs are neither of these!
Β§ Node numbering is arbitrary (node isomorphism problem) Β§ Much more complicated structure
Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 7 12/4/17
8
node2vec: Random Walk Based (Unsupervised) Feature Learning
Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu
node2vec: Scalable Feature Learning for Networks
- A. Grover, J. Leskovec. KDD 2016.
12/4/17
Β‘ Goal: Embed nodes with similar network
neighborhoods close in the feature space.
Β‘ We frame this goal as prediction-task
independent maximum likelihood optimization problem.
Β‘ Key observation: Flexible notion of network
neighborhood ππ(π£) of node u leads to rich features.
Β‘ Develop biased 2nd order random walk procedure
S to generate network neighborhood ππ(π£) of node u.
12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 9
Β‘ Intuition: Find embedding of nodes to
d-dimensions that preserves similarity
Β‘ Idea: Learn node embedding such that nearby
nodes are close together
Β‘ Given a node u, how do we define nearby
nodes?
Β§ π
+ π£ β¦ neighbourhood of u obtained by some
strategy S
10 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12/4/17
Β‘ Given π» = (π, πΉ), Β‘ Our goal is to learn a mapping π: π£ β βπ. Β‘ Log-likelihood objective:
max
π
β log Pr(ππ(π£)| π π£ )
- π£ βπ
Β§ where ππ(π£) is neighborhood of node π£.
Β‘ Given node π£, we want to learn feature
representations predictive of nodes in its neighborhood ππ(π£).
12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11
max
π
? log Pr(ππ(π£)| π π£ )
- π£ βπ
Β‘ Assumption: Conditional likelihood factorizes
- ver the set of neighbors.
log Pr(π+(π£|π π£ ) = ? log Pr (π(πA)| π π£ )
- BCβDE(F)
Β‘ Softmax parametrization:
Pr(π(ππ)| π π£ ) =
exp(π ππ β π(π£)) β exp(π π€ β π(π£)))
- π€βπ
12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12
max
L
? ? log exp(π πA β π(π£)) β exp(π π€ β π(π£)))
- MβN
- BβDO(F)
- F βN
Β‘ Maximize the objective using Stochastic
Gradient descent with negative sampling.
Β§ Computing the summation is expensive Β§ Idea: Just sample a couple of βnegative nodesβ Β§ This means at each iteration only embeddings of a few nodes will be updated at a time Β§ Much faster training of embeddings
12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 13
Two classic strategies to define a neighborhood π+ π£ of a given node π£:
u s3 s2
s1
s4 s8 s9 s6 s7 s5
BFS DFS
14
πPQ+ π£ = { π‘T, π‘U, π‘V} πXQ+ π£ = { π‘Y, π‘Z, π‘[} Local microscopic view Global macroscopic view
Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12/4/17
BFS: Micro-view of neighbourhood
u
DFS: Macro-view of neighbourhood
15 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12/4/17
Biased random walk π that given a node π£ generates neighborhood π+ π£
Β‘ Two parameters:
Β§ Return parameter π:
Β§ Return back to the previous node
Β§ In-out parameter π:
Β§ Moving outwards (DFS) vs. inwards (BFS)
16 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12/4/17
πΆπ»(π): Biased 2nd-order random walks explore network neighborhoods:
Β§ BFS-like: low value of π Β§ DFS-like: low value of π
π, π can learned in a semi-supervised way u β s4 β ? u s1 s5
u s1 s4 s5
1 1/q 1/p
Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 17 12/4/17
Β‘ 1) Compute random walk probs. Β‘ 2) Simulate π random walks of length π starting
from each node u
Β‘ 3) Optimize the node2vec objective using
Stochastic Gradient Descent Linear-time complexity. All 3 steps are individually parallelizable
Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 18 12/4/17
Interactions of characters in a novel: p=1, q=2
Microscopic view of the network neighbourhood
p=1, q=0.5
Macroscopic view of the network neighbourhood
19 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12/4/17
20 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12/4/17
0.0 0.1 0.2 0.3 0.4 0.5 0.6
Fraction of missing edges
0.00 0.05 0.10 0.15 0.20
Macro-F1 score
0.0 0.1 0.2 0.3 0.4 0.5 0.6
Fraction of additional edges
0.00 0.05 0.10 0.15 0.20
Macro-F1 score
21 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12/4/17
General-purpose feature learning in networks:
Β‘ An explicit locality preserving objective for
feature learning.
Β‘ Biased random walks capture diversity of
network patterns.
Β‘ Scalable and robust algorithm with excellent
empirical performance.
Β‘ Future extensions would involve designing
random walk strategies entailed to network with specific structure such as heterogeneous networks and signed networks.
12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 22
23
OhmNet: Extension to Hierarchical Networks
Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12/4/17
Letβs generalize node2vec to multilayer networks!
Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 24 12/4/17
Β‘ Each network is a layer π»A = (π
A, πΉA)
Β‘ Similarities between layers are given in
hierarchy β³, map π encodes parent-child relationships
Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 25 12/4/17
Β‘ Computational framework that learns
features of every node and at every scale based on:
Β§ Edges within each layer Β§ Inter-layer relationships between nodes active
- n different layers
26 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12/4/17
27
Input Output: embeddings of nodes in layers as well as internal levels of the hierarchy
π»A π»
e
π»f π»g
2 3 1
- OhmNet: Given layers
Gi and hierarchy M, learn node features captured by functions fi
- Functions fi embed
every node in a d- dimensional feature space
28
A multi-layer network with four layers and a two-level hierarchy M
Β‘ Given: Layers π»A
, hierarchy β³
Β§ Layers π»A AhT..j are in leaves of β³
Β‘ Goal: Learn functions: π
A: π A β β&
Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 29 12/4/17
Β‘ Approach has two components:
Β§ Per-layer objectives: Nodes with similar network neighborhoods in each layer are embedded close together Β§ Hierarchical dependency objectives: Nodes in nearby layers in hierarchy are encouraged to share similar features
Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 30 12/4/17
Β‘ Intuition: For each layer, find a mapping of
nodes to π-dimensions that preserves node similarity
Β‘ Approach: Similarity of nodes π£ and π€ is
defined based on similarity of their network neighborhoods
Β‘ Given node π£ in layer π we define nearby
nodes πA(π£) based on random walks starting at node π£
Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 31 12/4/17
Β‘ Given node π£ in layer π, learn π£βs representation
such that it predicts nearby nodes πA(π£):
Β‘ Given π layers, maximize: Β‘ Notice: Nodes in different networks representing
the same entity have different features
Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 32
β¦i = X
uβVi
Οi(u), for i = 1, 2, . . . , T.
for
Οi(u) = log Pr(Ni(u)|fi(u)),
12/4/17
Β‘ So far, we did not consider hierarchy β³ Β‘ Node representations in different layers are
learned independently of each other
How to model dependencies between layers when learning node features?
Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 33 12/4/17
Β‘ We use regularization to share information
across the hierarchy
Β‘ We want to enforce similarity between
feature representations of networks that are located nearby in the hierarchy
34 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu
Β‘ Given node π£, learn π£βs representation in
layer π to be close to π£βs representation in parent π(π):
Β‘ Multi-scale: Repeat at every level of β³
Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 35
ci(u) = 1 2 kfi(u) fΟ(i)(u)k2
2.
Ci = X
uβLi
ci(u),
πA has all layers appearing in sub-hierarchy rooted at π
12/4/17
Β‘ Nodes in different layers representing the same
entity have the same features in hierarchy ancestors
Β‘ We learn feature representations at multiple
scales:
Β§ features of nodes in the layers Β§ features of nodes in non-leaves in the hierarchy
Β‘ This model is more efficient than the fully
pairwise model, where dependencies between layers are modeled by pairwise comparisons of nodes across all pairs of layers
36 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu
37
Learning node features in multi-layer networks Solve maximum likelihood problem:
max
f1,f2,...,f|M|
X
iβT
β¦i Ξ» X
jβM
Cj,
Per-layer network
- bjectives
Hierarchical dependency
- bjectives
Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12/4/17
Β‘ Proteins are worker molecules
Β§ Understanding protein function has great biomedical and pharmaceutical implications
Β‘ Function of proteins depends on
their tissue context
[Greene et al., Nat Genet β15]
38
G1 G2 G3 G4
12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu
Β‘ The precise function of proteins depends on their tissue context (Greene et al., Nat Genet 2015) Β‘ Diseases result from the failure of tissue-specific processes (Hu et al., Nat Rev Genet 2016) Β‘ Current models assume that protein functions are constant across tissues
39
Tissue-specific protein interaction networks
G1 G2 G3 G4
Β‘ A multi-layer tissue
network has many network layers (tissues)
Β‘ Each layer corresponds
to one tissue-specific protein interaction network
Β‘ Hierarchy M encodes
biological similarities between the tissues at multiple scales
40 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu
107 genome-wide tissue-specific protein interaction networks
Β‘ 584 tissue-specific cellular functions Β‘ Examples (tissue, cellular function):
Β§ (renal cortex, cortex development) Β§ (artery, pulmonary artery morphogenesis)
41 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu
Frontal lobe Medulla
- blongata
Pons Substantia nigra Midbrain Parietal lobe Occipital lobe Temporal lobe
Brainstem Brain
Cerebellum
12/4/17
Frontal lobe Medulla
- blongata
Pons Substantia nigra Midbrain Parietal lobe Occipital lobe Temporal lobe
Brainstem Brain
Cerebellum
42
9 brain tissue PPI networks in two-level hierarchy
Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12/4/17
43 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12/4/17
Β‘ Cellular function prediction is a multi-label node
classification task
Β‘ Every node (protein) is assigned one or more labels
(cellular functions)
Β‘ Setup:
Β§ We apply OhmNet, which for every node in every layer learns a separate feature vector in an unsupervised way. Β§ For every layer and every function we then train a separate one- vs-all regularized linear classifier using the modified Huber loss Β§ During the training phase, we observe only a certain fraction of proteins and all their cellular functions across the layers Β§ The task is then to predict the tissue-specific functions for the remaining proteins
44 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu
Tissues
12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 45
Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 46
Β‘ 42% improvement
- ver state-of-the-art
- n the same dataset
12/4/17
Transfer functions to unannotated tissues
Β‘ Task: Predict functions in target tissue without
access to any annotation/label in that tissue
Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 47
Target tissue OhmNet Tissue non- specific Improvement Placenta 0.758 0.684 11% Spleen 0.779 0.712 10% Liver 0.741 0.553 34% Forebrain 0.755 0.632 20% Blood plasma 0.703 0.540 40% Smooth muscle 0.729 0.583 25% Average 0.746 0.617 21%
Reported are AUC values
12/4/17