http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node - - PowerPoint PPT Presentation

β–Ά
http cs224w stanford edu
SMART_READER_LITE
LIVE PREVIEW

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node - - PowerPoint PPT Presentation

CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2


slide-1
SLIDE 1

CS224W: Analysis of Networks Jure Leskovec, Stanford University

http://cs224w.stanford.edu

slide-2
SLIDE 2

? ? ? ? ?

Machine Learning

Node classification

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2 12/4/17

slide-3
SLIDE 3

3

Raw Data Structured Data Learning Algorithm Model Downstream prediction task Feature Engineering

Automatically learn the features

Β‘ (Supervised) Machine Learning

Lifecycle: This feature, that feature. Every single time!

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12/4/17

slide-4
SLIDE 4

Goal: Efficient task-independent feature learning for machine learning in networks!

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 4

vec node 2 𝑔: 𝑣 β†’ ℝ& ℝ&

Feature representation, embedding

u

12/4/17

slide-5
SLIDE 5

Β‘ We map each node in a network into a low-

dimensional space

Β§ Distributed representation for nodes Β§ Similarity between nodes indicates link strength Β§ Encode network information and generate node representation

12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 5

  • –

– –

17

slide-6
SLIDE 6

Β‘ Zachary’s Karate Club network:

12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 6

  • Zachary’s Karate Network:
slide-7
SLIDE 7

Graph representation learning is hard:

Β‘ Images are fixed size

Β§ Convolutions (CNNs)

Β‘ Text is linear

Β§ Sliding window (word2vec)

Β‘ Graphs are neither of these!

Β§ Node numbering is arbitrary (node isomorphism problem) Β§ Much more complicated structure

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 7 12/4/17

slide-8
SLIDE 8

8

node2vec: Random Walk Based (Unsupervised) Feature Learning

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu

node2vec: Scalable Feature Learning for Networks

  • A. Grover, J. Leskovec. KDD 2016.

12/4/17

slide-9
SLIDE 9

Β‘ Goal: Embed nodes with similar network

neighborhoods close in the feature space.

Β‘ We frame this goal as prediction-task

independent maximum likelihood optimization problem.

Β‘ Key observation: Flexible notion of network

neighborhood 𝑂𝑇(𝑣) of node u leads to rich features.

Β‘ Develop biased 2nd order random walk procedure

S to generate network neighborhood 𝑂𝑇(𝑣) of node u.

12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 9

slide-10
SLIDE 10

Β‘ Intuition: Find embedding of nodes to

d-dimensions that preserves similarity

Β‘ Idea: Learn node embedding such that nearby

nodes are close together

Β‘ Given a node u, how do we define nearby

nodes?

Β§ 𝑂

+ 𝑣 … neighbourhood of u obtained by some

strategy S

10 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12/4/17

slide-11
SLIDE 11

Β‘ Given 𝐻 = (π‘Š, 𝐹), Β‘ Our goal is to learn a mapping 𝑔: 𝑣 β†’ ℝ𝑒. Β‘ Log-likelihood objective:

max

𝑔

βˆ‘ log Pr(𝑂𝑇(𝑣)| 𝑔 𝑣 )

  • 𝑣 βˆˆπ‘Š

Β§ where 𝑂𝑇(𝑣) is neighborhood of node 𝑣.

Β‘ Given node 𝑣, we want to learn feature

representations predictive of nodes in its neighborhood 𝑂𝑇(𝑣).

12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11

slide-12
SLIDE 12

max

𝑔

? log Pr(𝑂𝑇(𝑣)| 𝑔 𝑣 )

  • 𝑣 βˆˆπ‘Š

Β‘ Assumption: Conditional likelihood factorizes

  • ver the set of neighbors.

log Pr(𝑂+(𝑣|𝑔 𝑣 ) = ? log Pr (𝑔(π‘œA)| 𝑔 𝑣 )

  • BC∈DE(F)

Β‘ Softmax parametrization:

Pr(𝑔(π‘œπ‘—)| 𝑔 𝑣 ) =

exp(𝑔 π‘œπ‘— ⋅𝑔(𝑣)) βˆ‘ exp(𝑔 𝑀 ⋅𝑔(𝑣)))

  • π‘€βˆˆπ‘Š

12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12

slide-13
SLIDE 13

max

L

? ? log exp(𝑔 π‘œA β‹… 𝑔(𝑣)) βˆ‘ exp(𝑔 𝑀 β‹… 𝑔(𝑣)))

  • M∈N
  • B∈DO(F)
  • F ∈N

Β‘ Maximize the objective using Stochastic

Gradient descent with negative sampling.

Β§ Computing the summation is expensive Β§ Idea: Just sample a couple of β€œnegative nodes” Β§ This means at each iteration only embeddings of a few nodes will be updated at a time Β§ Much faster training of embeddings

12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 13

slide-14
SLIDE 14

Two classic strategies to define a neighborhood 𝑂+ 𝑣 of a given node 𝑣:

u s3 s2

s1

s4 s8 s9 s6 s7 s5

BFS DFS

14

𝑂PQ+ 𝑣 = { 𝑑T, 𝑑U, 𝑑V} 𝑂XQ+ 𝑣 = { 𝑑Y, 𝑑Z, 𝑑[} Local microscopic view Global macroscopic view

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12/4/17

slide-15
SLIDE 15

BFS: Micro-view of neighbourhood

u

DFS: Macro-view of neighbourhood

15 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12/4/17

slide-16
SLIDE 16

Biased random walk 𝑇 that given a node 𝑣 generates neighborhood 𝑂+ 𝑣

Β‘ Two parameters:

Β§ Return parameter π‘ž:

Β§ Return back to the previous node

Β§ In-out parameter π‘Ÿ:

Β§ Moving outwards (DFS) vs. inwards (BFS)

16 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12/4/17

slide-17
SLIDE 17

𝑢𝑻(𝒗): Biased 2nd-order random walks explore network neighborhoods:

Β§ BFS-like: low value of π‘ž Β§ DFS-like: low value of π‘Ÿ

π‘ž, π‘Ÿ can learned in a semi-supervised way u β†’ s4 β†’ ? u s1 s5

u s1 s4 s5

1 1/q 1/p

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 17 12/4/17

slide-18
SLIDE 18

Β‘ 1) Compute random walk probs. Β‘ 2) Simulate 𝑠 random walks of length π‘š starting

from each node u

Β‘ 3) Optimize the node2vec objective using

Stochastic Gradient Descent Linear-time complexity. All 3 steps are individually parallelizable

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 18 12/4/17

slide-19
SLIDE 19

Interactions of characters in a novel: p=1, q=2

Microscopic view of the network neighbourhood

p=1, q=0.5

Macroscopic view of the network neighbourhood

19 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12/4/17

slide-20
SLIDE 20

20 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12/4/17

slide-21
SLIDE 21

0.0 0.1 0.2 0.3 0.4 0.5 0.6

Fraction of missing edges

0.00 0.05 0.10 0.15 0.20

Macro-F1 score

0.0 0.1 0.2 0.3 0.4 0.5 0.6

Fraction of additional edges

0.00 0.05 0.10 0.15 0.20

Macro-F1 score

21 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12/4/17

slide-22
SLIDE 22

General-purpose feature learning in networks:

Β‘ An explicit locality preserving objective for

feature learning.

Β‘ Biased random walks capture diversity of

network patterns.

Β‘ Scalable and robust algorithm with excellent

empirical performance.

Β‘ Future extensions would involve designing

random walk strategies entailed to network with specific structure such as heterogeneous networks and signed networks.

12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 22

slide-23
SLIDE 23

23

OhmNet: Extension to Hierarchical Networks

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12/4/17

slide-24
SLIDE 24

Let’s generalize node2vec to multilayer networks!

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 24 12/4/17

slide-25
SLIDE 25

Β‘ Each network is a layer 𝐻A = (π‘Š

A, 𝐹A)

Β‘ Similarities between layers are given in

hierarchy β„³, map 𝜌 encodes parent-child relationships

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 25 12/4/17

slide-26
SLIDE 26

Β‘ Computational framework that learns

features of every node and at every scale based on:

Β§ Edges within each layer Β§ Inter-layer relationships between nodes active

  • n different layers

26 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12/4/17

slide-27
SLIDE 27

27

Input Output: embeddings of nodes in layers as well as internal levels of the hierarchy

𝐻A 𝐻

e

𝐻f 𝐻g

2 3 1

slide-28
SLIDE 28
  • OhmNet: Given layers

Gi and hierarchy M, learn node features captured by functions fi

  • Functions fi embed

every node in a d- dimensional feature space

28

A multi-layer network with four layers and a two-level hierarchy M

slide-29
SLIDE 29

‘ Given: Layers 𝐻A

, hierarchy β„³

Β§ Layers 𝐻A AhT..j are in leaves of β„³

Β‘ Goal: Learn functions: 𝑔

A: π‘Š A β†’ ℝ&

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 29 12/4/17

slide-30
SLIDE 30

Β‘ Approach has two components:

Β§ Per-layer objectives: Nodes with similar network neighborhoods in each layer are embedded close together Β§ Hierarchical dependency objectives: Nodes in nearby layers in hierarchy are encouraged to share similar features

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 30 12/4/17

slide-31
SLIDE 31

Β‘ Intuition: For each layer, find a mapping of

nodes to 𝑒-dimensions that preserves node similarity

Β‘ Approach: Similarity of nodes 𝑣 and 𝑀 is

defined based on similarity of their network neighborhoods

Β‘ Given node 𝑣 in layer 𝑗 we define nearby

nodes 𝑂A(𝑣) based on random walks starting at node 𝑣

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 31 12/4/17

slide-32
SLIDE 32

Β‘ Given node 𝑣 in layer 𝑗, learn 𝑣’s representation

such that it predicts nearby nodes 𝑂A(𝑣):

Β‘ Given π‘ˆ layers, maximize: Β‘ Notice: Nodes in different networks representing

the same entity have different features

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 32

Ωi = X

u∈Vi

Ο‰i(u), for i = 1, 2, . . . , T.

for

Ο‰i(u) = log Pr(Ni(u)|fi(u)),

12/4/17

slide-33
SLIDE 33

Β‘ So far, we did not consider hierarchy β„³ Β‘ Node representations in different layers are

learned independently of each other

How to model dependencies between layers when learning node features?

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 33 12/4/17

slide-34
SLIDE 34

Β‘ We use regularization to share information

across the hierarchy

Β‘ We want to enforce similarity between

feature representations of networks that are located nearby in the hierarchy

34 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu

slide-35
SLIDE 35

Β‘ Given node 𝑣, learn 𝑣’s representation in

layer 𝑗 to be close to 𝑣’s representation in parent 𝜌(𝑗):

Β‘ Multi-scale: Repeat at every level of β„³

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 35

ci(u) = 1 2 kfi(u) fΟ€(i)(u)k2

2.

Ci = X

u∈Li

ci(u),

𝑀A has all layers appearing in sub-hierarchy rooted at 𝑗

12/4/17

slide-36
SLIDE 36

Β‘ Nodes in different layers representing the same

entity have the same features in hierarchy ancestors

Β‘ We learn feature representations at multiple

scales:

Β§ features of nodes in the layers Β§ features of nodes in non-leaves in the hierarchy

Β‘ This model is more efficient than the fully

pairwise model, where dependencies between layers are modeled by pairwise comparisons of nodes across all pairs of layers

36 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu

slide-37
SLIDE 37

37

Learning node features in multi-layer networks Solve maximum likelihood problem:

max

f1,f2,...,f|M|

X

i∈T

Ωi Ξ» X

j∈M

Cj,

Per-layer network

  • bjectives

Hierarchical dependency

  • bjectives

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12/4/17

slide-38
SLIDE 38

Β‘ Proteins are worker molecules

Β§ Understanding protein function has great biomedical and pharmaceutical implications

Β‘ Function of proteins depends on

their tissue context

[Greene et al., Nat Genet β€˜15]

38

G1 G2 G3 G4

12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu

slide-39
SLIDE 39

Β‘ The precise function of proteins depends on their tissue context (Greene et al., Nat Genet 2015) Β‘ Diseases result from the failure of tissue-specific processes (Hu et al., Nat Rev Genet 2016) Β‘ Current models assume that protein functions are constant across tissues

39

Tissue-specific protein interaction networks

G1 G2 G3 G4

slide-40
SLIDE 40

Β‘ A multi-layer tissue

network has many network layers (tissues)

Β‘ Each layer corresponds

to one tissue-specific protein interaction network

Β‘ Hierarchy M encodes

biological similarities between the tissues at multiple scales

40 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu

slide-41
SLIDE 41

107 genome-wide tissue-specific protein interaction networks

Β‘ 584 tissue-specific cellular functions Β‘ Examples (tissue, cellular function):

Β§ (renal cortex, cortex development) Β§ (artery, pulmonary artery morphogenesis)

41 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu

Frontal lobe Medulla

  • blongata

Pons Substantia nigra Midbrain Parietal lobe Occipital lobe Temporal lobe

Brainstem Brain

Cerebellum

12/4/17

slide-42
SLIDE 42

Frontal lobe Medulla

  • blongata

Pons Substantia nigra Midbrain Parietal lobe Occipital lobe Temporal lobe

Brainstem Brain

Cerebellum

42

9 brain tissue PPI networks in two-level hierarchy

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12/4/17

slide-43
SLIDE 43

43 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12/4/17

slide-44
SLIDE 44

Β‘ Cellular function prediction is a multi-label node

classification task

Β‘ Every node (protein) is assigned one or more labels

(cellular functions)

Β‘ Setup:

Β§ We apply OhmNet, which for every node in every layer learns a separate feature vector in an unsupervised way. Β§ For every layer and every function we then train a separate one- vs-all regularized linear classifier using the modified Huber loss Β§ During the training phase, we observe only a certain fraction of proteins and all their cellular functions across the layers Β§ The task is then to predict the tissue-specific functions for the remaining proteins

44 12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu

slide-45
SLIDE 45

Tissues

12/4/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 45

slide-46
SLIDE 46

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 46

Β‘ 42% improvement

  • ver state-of-the-art
  • n the same dataset

12/4/17

slide-47
SLIDE 47

Transfer functions to unannotated tissues

Β‘ Task: Predict functions in target tissue without

access to any annotation/label in that tissue

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 47

Target tissue OhmNet Tissue non- specific Improvement Placenta 0.758 0.684 11% Spleen 0.779 0.712 10% Liver 0.741 0.553 34% Forebrain 0.755 0.632 20% Blood plasma 0.703 0.540 40% Smooth muscle 0.729 0.583 25% Average 0.746 0.617 21%

Reported are AUC values

12/4/17