of Graph Embeddings Aleksandar Bojchevski Technical University of - - PowerPoint PPT Presentation

โ–ถ
of graph embeddings
SMART_READER_LITE
LIVE PREVIEW

of Graph Embeddings Aleksandar Bojchevski Technical University of - - PowerPoint PPT Presentation

Uncertainty and Robustness of Graph Embeddings Aleksandar Bojchevski Technical University of Munich, Germany Graph Embedding Day 2018 - Lyon Neglected aspects of graph embeddings Capturing uncertainty Robustness to noise Robustness to


slide-1
SLIDE 1

Uncertainty and Robustness

  • f Graph Embeddings

Aleksandar Bojchevski Technical University of Munich, Germany Graph Embedding Day 2018 - Lyon

slide-2
SLIDE 2

Neglected aspects of graph embeddings

Capturing uncertainty Robustness to noise Robustness to adversarial attacks

Uncertainty and Robustness of Graph Embeddings - Bojchevski 2

slide-3
SLIDE 3

Neglected aspects of graph embeddings

Capturing uncertainty Robustness to noise Robustness to adversarial attacks

Uncertainty and Robustness of Graph Embeddings - Bojchevski 3

slide-4
SLIDE 4

Nodes are points in a low-dimensional space

Uncertainty and Robustness of Graph Embeddings - Bojchevski 4

slide-5
SLIDE 5

Nodes are distributions

Uncertainty and Robustness of Graph Embeddings - Bojchevski 5

slide-6
SLIDE 6

Graph2Gauss - 3 key modeling ideas

Uncertainty and Robustness of Graph Embeddings - Bojchevski 6 ๐‘™ = 1 ๐‘™ = 2

  • 1. Uncertainty
  • 2. Personalized ranking
  • 3. Inductiveness

๐’ช( ๐œˆ๐‘—, ฮฃ๐‘—) ๐‘ฆ๐‘—

๐‘”

๐œ„(๐‘ฆ๐‘—)

deep encoder

slide-7
SLIDE 7

Uncertainty

Embed nodes as (Gaussian) distributions Sources of uncertainty:

  • Conflicting structure and attributes
  • Heterogenous neighborhood
  • Noise, outliers, anomalies, โ€ฆ.

Uncertainty and Robustness of Graph Embeddings - Bojchevski 7

slide-8
SLIDE 8

Personalized ranking

For each node ๐‘—: nodes in its (๐‘™)-hop neighborhood should be closer to ๐‘— compared to nodes in its (๐‘™ + 1)-hop neighborhood

Uncertainty and Robustness of Graph Embeddings - Bojchevski 8

๐‘™ = 1 ๐‘™ = 2

slide-9
SLIDE 9

Personalized ranking

For each node ๐‘—: nodes in its (๐‘™)-hop neighborhood should be closer to ๐‘— compared to nodes in its (๐‘™ + 1)-hop neighborhood

Uncertainty and Robustness of Graph Embeddings - Bojchevski 9

๐‘™ = 1 ๐‘™ = 2

slide-10
SLIDE 10

Personalized ranking

For each node ๐‘—: nodes in its (๐‘™)-hop neighborhood should be closer to ๐‘— compared to nodes in its (๐‘™ + 1)-hop neighborhood Example: closer in terms of the KL Diveregence KL is asymmetric โ‡’ handles directed graphs

Uncertainty and Robustness of Graph Embeddings - Bojchevski 10

๐‘™ = 1 ๐‘™ = 2

slide-11
SLIDE 11

Personalized ranking

Personalized ranking implies pairwise constraints for node ๐‘— D๐ฟ๐‘€(๐’ช

๐‘˜||๐’ช ๐‘—) < D๐ฟ๐‘€ (๐’ช๐‘˜โ€ฒ||๐’ช ๐‘—)

โˆ€๐‘˜ โˆˆ ๐‘‚๐‘—

(๐‘™), โˆ€๐‘˜โ€ฒ โˆˆ ๐‘‚๐‘— (๐‘™โ€ฒ), โˆ€๐‘™ < ๐‘™โ€ฒ

Uncertainty and Robustness of Graph Embeddings - Bojchevski 11

๐‘™ = 1 ๐‘™ = 2 set of nodes in the ๐‘™-hop neighborhood of node ๐‘—

slide-12
SLIDE 12

Inductiveness

Generalize to unseen nodes by learning a mapping from features to embeddings

Uncertainty and Robustness of Graph Embeddings - Bojchevski 12

๐’ช( ๐œˆ๐‘—, ฮฃ๐‘—) ๐‘ฆ๐‘—

๐‘”

๐œ„(๐‘ฆ๐‘—)

deep encoder

slide-13
SLIDE 13

Graph2Gauss - 3 key modeling ideas

Uncertainty and Robustness of Graph Embeddings - Bojchevski 13 ๐‘™ = 1 ๐‘™ = 2

  • 1. Uncertainty
  • 2. Personalized ranking
  • 3. Inductiveness

๐’ช( ๐œˆ๐‘—, ฮฃ๐‘—) ๐‘ฆ๐‘—

๐‘”

๐œ„(๐‘ฆ๐‘—)

deep encoder

slide-14
SLIDE 14

Learning with energy-based loss

๐น๐‘—๐‘˜ = D๐ฟ๐‘€(๐’ช

๐‘˜| ๐’ช ๐‘—

โ„’ = ฯƒ ๐‘—,๐‘˜,๐‘˜โ€ฒ (๐น๐‘—๐‘˜

2 + expโˆ’๐น๐‘—๐‘˜โ€ฒ)

Closer nodes should have lower energy Naively: ๐‘ƒ(๐‘‚3) complexity Node-anchored sampling strategy:

  • For each node same one another node from every neighborhood
  • Less than 4.2% triplets seen to match performance
  • Lower gradient variance

Uncertainty and Robustness of Graph Embeddings - Bojchevski 14

slide-15
SLIDE 15

Graph2Gauss is parameter/data efficient

15 Uncertainty and Robustness of Graph Embeddings - Bojchevski

slide-16
SLIDE 16

Graph2Gauss captures uncertainty

Uncertainty correlates with diversity Diversity: number of distinct classes in a nodeโ€™s k-hop neighborhood

Uncertainty and Robustness of Graph Embeddings - Bojchevski 16

slide-17
SLIDE 17

Graph2Gauss captures uncertainty

Uncertainty reveals the intrinsic latent dimensionality of the graph Detected latent dimensions โ‰ˆ number ground-truth communities

Uncertainty and Robustness of Graph Embeddings - Bojchevski 17

slide-18
SLIDE 18

Uncertainty and link prediction

Prune dimensions with high uncertainty Maintaining link prediction performance

Uncertainty and Robustness of Graph Embeddings - Bojchevski 18

slide-19
SLIDE 19

Graph2Gauss is effective for visualization

Uncertainty and Robustness of Graph Embeddings - Bojchevski 19

slide-20
SLIDE 20

Neglected aspects of graph embeddings

Capturing uncertainty Robustness to noise Robustness to adversarial attacks

Uncertainty and Robustness of Graph Embeddings - Bojchevski 20

slide-21
SLIDE 21

Why spectral embedding

Uncertainty and Robustness of Graph Embeddings - Bojchevski 21

https://www.semanticscholar.org

slide-22
SLIDE 22

What is spectral clustering

Graph clustering

  • Maximize within-cluster edges
  • Minimize between cluster edges

22

Similarity Graph Spectral Embedding

๐‘‚ = 9 ๐ธ = 5

  • 5
  • 4
  • 3
  • 2
  • 1

1 2 3 4

  • 4
  • 2

2 4

  • 5
  • 4
  • 3
  • 2
  • 1

1 2 3 4

  • 4
  • 2

2 4 Uncertainty and Robustness of Graph Embeddings - Bojchevski

slide-23
SLIDE 23

The minimum cut

Partition V into two sets ๐ท1 and ๐ท2, such that the sum of the inter-cluster edge weights cut ๐ท1, ๐ท2 = ฯƒ๐‘ค1โˆˆ๐ท1,๐‘ค2โˆˆ๐ท2 ๐‘ฅ(๐‘ค1, ๐‘ค2) is minimized Drawbacks:

  • Tends to cut small vertex sets from the rest of the graph
  • Considers only inter-cluster edges, no intra-cluster edges

Uncertainty and Robustness of Graph Embeddings - Bojchevski 23

1 2 5

3 4

2 4 2 4 4 2 2 3

1

slide-24
SLIDE 24

The normalized cut

Ratio Cut: Minimize

๐‘‘๐‘ฃ๐‘ข(๐ท1,๐ท2) |๐ท1|

+

๐‘‘๐‘ฃ๐‘ข(๐ท2,๐ท1) |๐ท2|

Normalized Cut: Minimize

๐‘‘๐‘ฃ๐‘ข(๐ท1,๐ท2) vol(๐ท1) + ๐‘‘๐‘ฃ๐‘ข(๐ท1,๐ท2) vol(๐ท2)

Uncertainty and Robustness of Graph Embeddings - Bojchevski 24

1 2 5

3 4

2 4 2 4 4 2 2 3

1

1 2 5

3 4

2 4 2 4 4 2 2 3

1

slide-25
SLIDE 25

Multi-way graph partitioning

Generalization to ๐‘™ โ‰ฅ 2 clusters Partition V into disjoint clusters ๐ท1, โ€ฆ , ๐ท๐‘™ such that

  • Cut: min

C1,โ€ฆ,Ck

ฯƒ๐‘—=1

๐‘™

๐‘‘๐‘ฃ๐‘ข(๐ทi, V\๐ทi)

  • Ratio Cut: min

C1,โ€ฆ,Ck

ฯƒ๐‘—=1

๐‘™ ๐‘‘๐‘ฃ๐‘ข(๐ทi,V\๐ทi) |๐ทi|

  • Normalized Cut: min

C1,โ€ฆ,Ck

ฯƒ๐‘—=1

๐‘™ ๐‘‘๐‘ฃ๐‘ข(๐ทi,V\๐ทi) vol(๐ท๐‘—)

Finding the optimal solution is NP-hard How to compute an approximate solution efficiently?

Uncertainty and Robustness of Graph Embeddings - Bojchevski 25

1 2 5

3 4

2 4 2 4 4 2 2 3

1

Minimum Cut for ๐‘™ = 3

slide-26
SLIDE 26

Graph Laplacian

Laplacian matrix ๐‘€ = ๐ธ โˆ’ ๐ต

  • ๐ต = (weighted) adjacency matrix, ๐ธ = degree matrix

Observation: For any vector ๐‘” we have ๐‘”๐‘ˆ โ‹… ๐‘€ โ‹… ๐‘” =

1 2 โ‹… ฯƒ ๐‘ฃ,๐‘ค โˆˆ ๐น ๐‘‹ ๐‘ฃ๐‘ค ๐‘” ๐‘ฃ โˆ’ ๐‘” ๐‘ค 2

Normalized Laplacian ๐‘€๐‘ก๐‘ง๐‘› = ๐ธโˆ’1

2๐‘€๐ธโˆ’1 2 = ๐ฝ โˆ’ ๐ธโˆ’1 2๐ต๐ธโˆ’1 2

26 Uncertainty and Robustness of Graph Embeddings - Bojchevski

slide-27
SLIDE 27

Physical interpretation of the Laplacian (I)

Let f be a heat distribution over a graph with ๐‘”

๐‘— = the heat at node ๐‘ค๐‘—

The heat transferred between ๐‘ค๐‘— and ๐‘ค๐‘˜ is prop. to (๐‘”

๐‘—โˆ’๐‘” ๐‘˜) if ๐‘—, ๐‘˜ โˆˆ ๐น

27

https://en.wikipedia.org/wiki/Laplacian_matrix#/media/ File:Graph_Laplacian_Diffusion_Example.gif

Uncertainty and Robustness of Graph Embeddings - Bojchevski

slide-28
SLIDE 28

Physical interpretation of the Laplacian (I)

Graph is viewed as an electrical circuit with edges as wires (resistors) Apply voltage at some nodes and measure induced voltage at other nodes Induced voltages minimizes ฯƒ ๐‘ฃ,๐‘ค โˆˆ ๐น ๐‘ฆ๐‘ฃ โˆ’ ๐‘ฆ๐‘ค 2 We can find the voltage by minimizing ๐‘ฆ๐‘ˆ๐‘€x

28 Uncertainty and Robustness of Graph Embeddings - Bojchevski

slide-29
SLIDE 29

Properties of the Graph Laplacian

L is symmetric and positive semi-definite The number of eigenvectors of ๐‘€ with eigenvalue 0 corresponds to the number of connected components Algebraic connectivity of a graph is ๐œ‡2(๐‘€)

  • The magnitude reflects how well connected the graph overall is

The spectrum of ๐‘€ encodes useful information about the graph

  • Unfortunately, there exist co-spectral graphs

29 Uncertainty and Robustness of Graph Embeddings - Bojchevski

slide-30
SLIDE 30

Minimum cut and the graph Laplacian

Define indicator vector: : โ„Ž๐ท๐‘™ ๐‘— = แ‰

1 |C๐‘—|

๐‘—๐‘” ๐‘ค๐‘— โˆˆ ๐ท๐‘™ ๐‘“๐‘š๐‘ก๐‘“ Let H = [โ„Ž๐ท1; โ„Ž๐ท2; โ€ฆ ; โ„Ž๐ท๐‘™] Observations: ๐ผ๐‘ˆ๐ผ = ๐ฝ๐‘’ is orthonormal โ„Ž๐ท๐‘—

๐‘ˆ โ‹… ๐‘€ โ‹… โ„Ž๐‘‘๐‘— = ๐‘‘๐‘ฃ๐‘ข ๐ท๐‘—,๐‘Š\๐ท๐‘— ๐ท๐‘—

and โ„Ž๐ท๐‘—

๐‘ˆ โ‹… ๐‘€ โ‹… โ„Ž๐‘‘๐‘— = (๐ผ๐‘ˆ๐‘€๐ผ)๐‘—๐‘—

๐‘†๐‘๐‘ข๐‘—๐‘๐ท๐‘ฃ๐‘ข(๐ท1, โ€ฆ , ๐ท๐‘™) = ฯƒ๐‘—=1

๐‘™ ๐‘‘๐‘ฃ๐‘ข ๐ท๐‘—,๐‘Š\๐ท๐‘— ๐ท๐‘—

= ฯƒ๐‘—=1

๐‘™ (๐ผ๐‘ˆ๐‘€๐ผ)๐‘—๐‘— = ๐‘ข๐‘ ๐‘๐‘‘๐‘“(๐ผ๐‘ˆ๐‘€๐ผ)

NetGAN: Generating Graphs via Random Walks - Bojchevski, Shchur, Zรผgner, Gรผnnemann. 30

1 2 5

3 4

2 4 2 4 4 2 2 3

1

slide-31
SLIDE 31

Minimum cut and the graph Laplacian

Minimizing ratio-cut (normalized cut with ๐‘€๐‘ก๐‘ง๐‘›) is equivalent to min

๐ท1,โ€ฆ,๐ท๐‘™ ๐‘ข๐‘ ๐‘๐‘‘๐‘“(๐ผ๐‘ˆ๐‘€๐ผ) subject to ๐ผ๐‘ˆ๐ผ = ๐ฝ๐‘’

Constraint relaxation: allow arbitrary values for H min

๐ผโˆˆ๐‘†๐‘Šร—๐ฟ ๐‘ข๐‘ ๐‘๐‘‘๐‘“(๐ผ๐‘ˆ๐‘€๐ผ) subject to ๐ผ๐‘ˆ๐ผ = ๐ฝ๐‘’

Standard trace minimization problem Optimal ๐ผ = First ๐ฟ smallest eigenvectors of ๐‘€

31 Uncertainty and Robustness of Graph Embeddings - Bojchevski

slide-32
SLIDE 32

Spectral embedding: random walk view

๐‘€๐‘ ๐‘ฅ = ๐ธโˆ’1๐‘€ = ๐ฝ โˆ’ ๐ธโˆ’1๐ต = ๐ฝ โˆ’ ๐‘„ is the the random walk Laplacian

  • ๐œ‡ is an eigenvalue of ๐‘€๐‘ ๐‘ฅ with eigenvector ๐‘ฃ if and only if ๐œ‡ is an eigenvalue of ๐‘€๐‘ก๐‘ง๐‘› with

eigenvector ๐‘ฅ = ๐ธ1/2๐‘ฃ

Let ๐‘„ ๐ถ ๐ต = ๐‘„ ๐‘Œ1 โˆˆ ๐ถ ๐‘Œ0 โˆˆ ๐ต be the probability of a random walker currently at any node in ๐ต to transition to any node in ๐ถ, for ๐ต โˆฉ ๐ถ = โˆ… and A, ๐ถ โŠ‚ ๐‘Š. Sample ๐‘Œ0 โˆผ ๐œŒ from the stationary distribution ๐‘‚๐‘‘๐‘ฃ๐‘ข(๐ต, าง ๐ต) = ๐‘„( าง ๐ต|๐ต) + ๐‘„(๐ต| าง ๐ต)

Uncertainty and Robustness of Graph Embeddings - Bojchevski 32

slide-33
SLIDE 33

Spectral Embedding

Finding the spectral embedding = Solving an optimization task ๐ผโˆ— = k-first eigenvectors of ๐‘€(๐ต)

33

๐ผโˆ— = arg min

๐ผโˆˆโ„๐‘œร—๐‘’ ๐‘ˆ๐‘ ๐‘๐‘‘๐‘“ ๐ผ๐‘ˆ โ‹… ๐‘€(๐ต) โ‹… ๐ผ

subject to ๐ผ๐‘ˆ โ‹… ๐ธ๐ต โ‹… ๐ผ = ๐ฝ๐‘’ ๐‘€(๐ต) = ๐ธ(๐ต) โˆ’ ๐ต Input Graph ๐ต Graph Laplacian ๐‘€(๐ต) Trace minimization Ratio/Normalized Cut Output Embedding ๐ผโˆ—

Uncertainty and Robustness of Graph Embeddings - Bojchevski

slide-34
SLIDE 34

Problem: sensitive to noisy data

Noisy

34

โ‡’ โ‡’ โ‡’ โ‡’ โ‡’ โ‡’

Spurious edges Distorted embedding Wrong clustering Clean

โ‡’ โ‡’ โ‡’

Noisy

Uncertainty and Robustness of Graph Embeddings - Bojchevski

slide-35
SLIDE 35

Robustness via Latent Decomposition

Jointly learn decomposition & embedding Decomposition steered by the underlying embedding / clustering

35

๐ต = ๐ต๐‘• + ๐ต๐‘‘ ๐ต๐‘‘ ๐ต๐‘• good graph sparse corruptions

+

๐ผโˆ— = arg min

๐ผโˆˆโ„๐‘œร—๐‘’ ๐‘ˆ๐‘ ๐‘๐‘‘๐‘“ ๐ผ๐‘ˆ โ‹… ๐‘€(๐ต๐‘•) โ‹… ๐ผ

subject to ๐ผ๐‘ˆ โ‹… ๐ธ(๐ต๐‘•) โ‹… ๐ผ = ๐ฝ๐‘’ ๐ตโˆ—, ๐ผโˆ— = arg min

๐ผโˆˆโ„๐‘œร—๐‘’ ๐ต๐‘•โˆˆ โ„โ‰ฅ0 ๐‘œร—๐‘œ

๐‘ˆ๐‘ ๐‘๐‘‘๐‘“ ๐ผ๐‘ˆ โ‹… ๐‘€(๐ต๐‘•) โ‹… ๐ผ subject to ๐ผ๐‘ˆ โ‹… ๐ธ(๐ต๐‘•) โ‹… ๐ผ = ๐ฝ๐‘’ ๐ต = ๐ต๐‘• + ๐ต๐‘‘ ๐ต๐‘‘

0 โ‰ค 2๐œ„

โˆ€๐‘—: ๐‘๐‘—

๐‘‘ 0 โ‰ค ๐œ•๐‘—

global local

Uncertainty and Robustness of Graph Embeddings - Bojchevski

slide-36
SLIDE 36

Solution: Alternating optimization

Update ๐ผ, Given ๐ต๐‘•/๐ต๐‘‘ โ†’ ๐น๐‘๐‘ก๐‘ง

  • Trace minimization problem
  • Solution for ๐ผ are the ๐‘™ first generalized eigenvectors of ๐‘€(๐ต๐‘•)

36

update ๐ต๐‘• update ๐ผ ๐ตโˆ—, ๐ผโˆ— = arg min

๐ผโˆˆโ„๐‘œร—๐‘’ ๐ต๐‘•โˆˆ โ„โ‰ฅ0 ๐‘œร—๐‘œ

๐‘ˆ๐‘ ๐‘๐‘‘๐‘“ ๐ผ๐‘ˆ โ‹… ๐‘€(๐ต๐‘•) โ‹… ๐ผ

Uncertainty and Robustness of Graph Embeddings - Bojchevski

slide-37
SLIDE 37

Solution: Alternating optimization

Update ๐ต๐‘•/๐ต๐‘‘, Given ๐ผ โ†’ ๐‘‚๐‘„ ๐ผ๐‘๐‘ ๐‘’

  • Express eigenvalues of ๐ต๐‘œ๐‘“๐‘ฅ

๐‘•

in closed form

  • ๐ต๐‘œ๐‘“๐‘ฅ

๐‘•

that minimizes the trace equivalent to maximizing ๐‘”

Robust Spectral Clustering 37 Aleksandar Bojchevski

๐‘” ๐‘๐‘ฃ๐‘ค

๐‘‘ ๐‘ฃ,๐‘คโˆˆ๐น

= เท

๐‘ฃ,๐‘คโˆˆ๐น

๐‘๐‘ฃ๐‘ค

๐‘‘

๐’Š๐‘ฃ โˆ’ ๐’Š๐‘ค 2

2 ๐‘œ๐‘๐‘’๐‘“๐‘ก ๐‘”๐‘๐‘  ๐‘๐‘ฅ๐‘๐‘ง ๐‘—๐‘œ ๐‘ขโ„Ž๐‘“ ๐‘“๐‘›๐‘๐‘“๐‘’๐‘’๐‘—๐‘œ๐‘• ๐‘ก๐‘ž๐‘๐‘‘๐‘“

โˆ’ ๐œ‡ โˆ˜ ๐’Š๐‘ฃ 2 โˆ’ ๐œ‡ โˆ˜ ๐’Š๐‘ค 2

๐‘ž๐‘ ๐‘“๐‘”๐‘“๐‘ ๐‘ก ๐‘“๐‘’๐‘•๐‘“๐‘ก ๐‘‘๐‘š๐‘๐‘ก๐‘“ ๐‘ข๐‘ ๐‘ขโ„Ž๐‘“ ๐‘๐‘ ๐‘—๐‘•๐‘—๐‘œ

subject to โ‹… o constraints

๐ตโˆ—, ๐ผโˆ— = arg min

๐ผโˆˆโ„๐‘œร—๐‘’ ๐ต๐‘•โˆˆ โ„โ‰ฅ0 ๐‘œร—๐‘œ

๐‘ˆ๐‘ ๐‘๐‘‘๐‘“ ๐ผ๐‘ˆ โ‹… ๐‘€(๐ต๐‘•) โ‹… ๐ผ

slide-38
SLIDE 38

Solution: Alternating optimization

Equivalent to Multidimensional Knapsack problem

  • Greedy approximation
  • Best possible approximation ratio of

1 ๐‘‚+1

Efficient solution in O(#edges)

38 Uncertainty and Robustness of Graph Embeddings - Bojchevski

slide-39
SLIDE 39

I

39

slide-40
SLIDE 40

Conclusion

Spectral embedding is sensitive to noisy data Robustness via latent decomposition

เธ“ ๐ต

๐‘๐‘ ๐‘—๐‘•๐‘—๐‘œ๐‘๐‘š ๐‘•๐‘ ๐‘๐‘žโ„Ž

= เธ” ๐ต๐‘•

๐‘•๐‘๐‘๐‘’ ๐‘•๐‘ ๐‘๐‘žโ„Ž

+ เธ” ๐ต๐‘‘

๐‘ก๐‘ž๐‘๐‘ ๐‘ก๐‘“ ๐‘‘๐‘๐‘ ๐‘ ๐‘ฃ๐‘ž๐‘ข๐‘—๐‘๐‘œ๐‘ก

Removed corrupted edges โ‡’ increased discrimination

40 Uncertainty and Robustness of Graph Embeddings - Bojchevski

slide-41
SLIDE 41

Neglected aspects of graph embeddings

Capturing uncertainty Robustness to noise Robustness to adversarial attacks

Uncertainty and Robustness of Graph Embeddings - Bojchevski 41

slide-42
SLIDE 42

Adversarial attacks on graph embeddings

(Spectral) Embeddings are not robust to noise / but we can remedy that

Are graph embeddings robust to adversarial attacks?

In domains where graph embeddings are used (e.g. the Web) adversaries are common and false data is easy to inject

Uncertainty and Robustness of Graph Embeddings - Bojchevski 42

slide-43
SLIDE 43

Adversarial attacks in the image domain

Image of a tabby cat correctly classified

  • Add imperceptible perturbation
  • Model classifies the cat as guacamole

43

Training data Training Model 88% tabby cat

Uncertainty and Robustness of Graph Embeddings - Bojchevski

slide-44
SLIDE 44

Perturbation

Adversarial attacks in the image domain

Image of a tabby cat correctly classified Add imperceptible perturbation Model classifies the cat as guacamole

44

Training data Training Model 99% guacamole

Perturbed image

Uncertainty and Robustness of Graph Embeddings - Bojchevski

slide-45
SLIDE 45

The relational nature of the data might

Improve Robustness embeddings are computed jointly rather than in isolation Cause Cascading Failures perturbations in one part of the graph can propagate to the rest

45 Uncertainty and Robustness of Graph Embeddings - Bojchevski

slide-46
SLIDE 46

Attack possibilities

General attack Goal: decrease the overall quality

  • f the embeddings

Actions:

  • Add/remove (flip) an edge
  • Add/remove a node
  • โ€ฆ

Targeted attack Goal: attack a specific node or a specific downstream task Examples:

  • Misclassify a target node ๐‘ข
  • Increase/decrease the similarity of

a set of node pairs ๐’ฐ โŠ‚ ๐‘Š ร— ๐‘Š

46 Uncertainty and Robustness of Graph Embeddings - Bojchevski

slide-47
SLIDE 47

Attack model formally

แˆ˜ ๐ตโˆ— = arg max

เท  ๐ตโˆˆ 0,1 ๐‘‚ร—๐‘‚ โ„’( แˆ˜

๐ต, ๐‘Žโˆ—) ๐‘Žโˆ— = min

๐‘Ž โ„’( แˆ˜

๐ต, ๐‘Ž) ๐‘ก๐‘ฃ๐‘๐‘˜. ๐‘ข๐‘ แˆ˜ ๐ต โˆ’ ๐ต 0 = 2๐‘”

Uncertainty and Robustness of Graph Embeddings - Bojchevski 47

Adjacency matrix of the graph after the attacker modified some entries Optimal embedding from the to be optimized graph แˆ˜ ๐ต The attackerโ€™s budget

slide-48
SLIDE 48

Attack model formally

แˆ˜ ๐ตโˆ— = arg max

เท  ๐ตโˆˆ 0,1 ๐‘‚ร—๐‘‚ โ„’( แˆ˜

๐ต, ๐‘Žโˆ—) ๐‘Žโˆ— = min

๐‘Ž โ„’( แˆ˜

๐ต, ๐‘Ž) ๐‘ก๐‘ฃ๐‘๐‘˜. ๐‘ข๐‘ แˆ˜ ๐ต โˆ’ ๐ต 0 = 2๐‘”

Uncertainty and Robustness of Graph Embeddings - Bojchevski 48

Adjacency matrix of the graph after the attacker modified some entries Optimal embedding from the to be optimized graph แˆ˜ ๐ต The attackerโ€™s budget General attack

slide-49
SLIDE 49

Attack model formally

แˆ˜ ๐ตโˆ— = arg max

เท  ๐ตโˆˆ 0,1 ๐‘‚ร—๐‘‚ โ„’๐‘๐‘ข๐‘‘๐‘™( แˆ˜

๐ต, ๐‘Žโˆ—) ๐‘Žโˆ— = min

๐‘Ž โ„’( แˆ˜

๐ต, ๐‘Ž) ๐‘ก๐‘ฃ๐‘๐‘˜. ๐‘ข๐‘ แˆ˜ ๐ต โˆ’ ๐ต 0 = 2๐‘”

Uncertainty and Robustness of Graph Embeddings - Bojchevski 49

Adjacency matrix of the graph after the attacker modified some entries Optimal embedding from the to be optimized graph แˆ˜ ๐ต The attackerโ€™s budget Targeted attack

slide-50
SLIDE 50

Challenges

Discrete and Combinatorial Bi-level optimization problem Transductive learning โ‡’ network poisoning setting

Uncertainty and Robustness of Graph Embeddings - Bojchevski 50

frozen model target train

Evasion

train train target model embedding

Poisoning

slide-51
SLIDE 51

Random-walk based embeddings

RW-based embeddings solve: ๐‘Žโˆ— = min

๐‘Ž โ„’( ๐‘  1, ๐‘  2, โ€ฆ , ๐‘Ž) with ๐‘  ๐‘— = ๐‘†๐‘‹ ๐‘š(๐ต)

๐‘Žโˆ— โˆˆ โ„๐‘‚ร—๐ฟ: learned embedding ๐‘†๐‘‹

๐‘š: e stochastic procedure that generates RWs of length ๐‘š

โ„’: model-specific loss e.g. skip-gram with negative sampling (SGSN) Challenge: RW sampling precludes gradient based optimization

Uncertainty and Robustness of Graph Embeddings - Bojchevski 51

slide-52
SLIDE 52

Example: DeepWalk

DeepWalk is equivalent* to factorizing เทฉ ๐‘ = log max ๐‘, 1 ๐‘ =

๐‘ค๐‘๐‘š ๐ต ๐‘ˆโ‹…๐‘ ๐‘‡

๐‘‡ = ฯƒ๐‘ =1

๐‘ˆ

๐‘„๐‘  ๐ธโˆ’1 ๐‘„ = ๐ธโˆ’1๐ต with ๐‘Žโˆ— obtained by the SVD of เทฉ ๐‘ = ๐‘‰ฮฃ๐‘Š๐‘ˆ using the top K largest singular values/vectors i.e. ๐‘Žโˆ— = ๐‘‰๐ฟฮฃ๐ฟ

1/2

Uncertainty and Robustness of Graph Embeddings - Bojchevski 52

transition matrix window size ๐‘ˆ ๐‘ negative samples Shifted Positive Pointwise Mutual Information (PPMI) Matrix

slide-53
SLIDE 53

Example: DeepWalk

Equivalent to min

เทฉ ๐‘๐ฟ

|| เทฉ ๐‘ โˆ’ เทฉ ๐‘๐ฟ||๐บ

2

The loss using the optimal embedding is โ„’๐ธ๐‘‹

1 ๐ต, ๐‘Žโˆ— =

ฯƒ๐‘ž=๐ฟ+1

|๐‘Š|

๐œ๐‘ž

2, where

๐œ1 โ‰ฅ ๐œ2 โ‰ฅ โ‹ฏ โ‰ฅ ๐œ|๐‘Š| are the singular values of เทฉ ๐‘(๐ต) ordered decreasingly Idea: Given a perturbation ฮ”๐ต, find the change in the singular values of เทฉ ๐‘(๐ต + ฮ”๐ต)

Uncertainty and Robustness of Graph Embeddings - Bojchevski 53

slide-54
SLIDE 54

Example: DeepWalk

เทฉ ๐‘ = log max ๐‘, 1 ๐‘ =

๐‘ค๐‘๐‘š ๐ต ๐‘ˆโ‹…๐‘ ๐‘‡

๐‘‡ = ฯƒ๐‘ =1

๐‘ˆ

๐‘„๐‘  ๐ธโˆ’1 Linearization: ignore the log(โ‹…) and max(โ‹…, 1) Scalars ๐‘ค๐‘๐‘š ๐ต , ๐‘ˆ, ๐‘ can be also ignore Rewrite โ„’๐ธ๐‘‹

1 ๐ต, ๐‘Žโˆ— =

ฯƒ๐‘ž=๐ฟ+1

|๐‘Š|

|๐œ‡๐‘ž|2 Thus, find a change in the spectrum of ๐‘‡ after the attacker perturbed the graph ฮ”๐ต

Uncertainty and Robustness of Graph Embeddings - Bojchevski 54

slide-55
SLIDE 55

Spectrum of S

Compute the generalized spectrum (generalized eigenvalues/vectors) of ๐ต i.e. compute and ๐‘‰, ฮ› that solve ๐ต๐‘ฃ = ๐œ‡๐ธ๐‘ฃ Rewrite ๐‘‡ = ฯƒ๐‘ =1

๐‘ˆ

๐‘„๐‘  ๐ธโˆ’1 as ๐‘‡ = ๐‘‰ (ฯƒ๐‘ =1

๐‘ˆ

ฮ›๐‘ )๐‘‰๐‘ˆ The task is now to find the change in generalized eigenvalues ๐œ‡๐‘ž of the adjacency matrix ๐ต given a perturbation ฮ”๐ต

Uncertainty and Robustness of Graph Embeddings - Bojchevski 55

simple function of the generalized eigenvalues ๐œ‡๐‘— of the graph

slide-56
SLIDE 56

Eigenvalue perturbation theory

Given ๐‘‰, ฮ› that solve ๐ต๐‘ฃ = ๐œ‡๐ธ๐‘ฃโ€ฒ and a small perturbation ฮ”๐ต, ฮ”๐ธ Find ๐‘‰โ€ฒ, ฮ›โ€ฒ that solve ๐ต + ฮ”๐ต ๐‘ฃโ€ฒ = ๐œ‡โ€ฒ(๐ธ + ฮ”๐ธ)๐‘ฃโ€ฒ First order approximation: ๐œ‡โ€ฒ๐‘ž = ๐œ‡๐‘ž + ๐‘ฃ๐‘ž

๐‘ˆ ฮ”๐ต + ๐œ‡๐‘žฮ”๐ธ ๐‘ฃ๐‘ž

for small ฮ”๐ต and ฮ”D higher order terms become negligible

Uncertainty and Robustness of Graph Embeddings - Bojchevski 56

slide-57
SLIDE 57

For a single edge flip

ฮ”๐ต is a matrix with only 2 non-zero elements for a single edge flip (๐‘—, ๐‘˜) namely ฮ”๐ต๐‘—๐‘˜ = ฮ”๐ต๐‘˜๐‘— = 1 โˆ’ 2A๐‘—๐‘˜ โ‰” ฮ”w๐‘—๐‘˜ Similarly, ฮ”D has only two non-zero elements on the diagonal Then we can approximate the generalized eigenvalues of A + ฮ”๐ต in closed-form computable in O(1) time:

๐œ‡โ€ฒ๐‘ž = ๐œ‡๐‘ž + ฮ”w๐‘—๐‘˜ 2๐‘ฃ๐‘ž๐‘— โ‹… ๐‘ฃ๐‘ž๐‘˜ โˆ’ ๐œ‡๐‘ž(๐‘ฃ๐‘ž๐‘—

2 + ๐‘ฃ๐‘ž๐‘˜ 2 )

Uncertainty and Robustness of Graph Embeddings - Bojchevski 57

slide-58
SLIDE 58

Connecting it all together

  • 1. DeepWalk is equivalent to a SVD of เทฉ

๐‘ = log max

๐‘ค๐‘๐‘š ๐ต ๐‘ˆโ‹…๐‘ ๐‘‡, 1

  • 2. The loss can be computed from the singular values / the spectrum of S
  • 3. The spectrum of ๐‘‡ can be easily computed from the generalized spectrum of A
  • 4. For any given edge flip (๐‘—, ๐‘˜) we can compute in O(1) the spectrum of ๐ต + ฮ”๐ต

Uncertainty and Robustness of Graph Embeddings - Bojchevski 58

slide-59
SLIDE 59

Overall algorithm

However, แˆ˜ ๐ตโˆ— = arg max

เท  ๐ตโˆˆ 0,1 ๐‘‚ร—๐‘‚ โ„’๐‘‘๐‘š๐‘๐‘ก๐‘“๐‘’โˆ’๐‘”๐‘๐‘ ๐‘›

๐‘ก๐‘ฃ๐‘๐‘˜. ๐‘ข๐‘ แˆ˜ ๐ต โˆ’ ๐ต 0 = 2๐‘” is still hard to optimize โ€“ ๐‘‚2 ๐‘” ways to choose the flips Greedy solution:

  • 1. For each edge (๐‘—, ๐‘˜) calculate its impact on the loss if flipped
  • 2. Pick the top f edges

59 Uncertainty and Robustness of Graph Embeddings - Bojchevski

slide-60
SLIDE 60

General attack (node classification proxy)

Uncertainty and Robustness of Graph Embeddings - Bojchevski 60

slide-61
SLIDE 61

Targeted attack

To target node ๐‘ค we need the change in its embedding ๐‘Ž๐‘ค

โˆ—

That is we need the change in eigenvectors Apply eigenvalue perturbation again to approximate the top ๐ฟ eigenvectors For a given edge flip (๐‘—, ๐‘˜) we get: ๐‘ฃ๐‘ž

โ€ฒ = ๐‘ฃ๐‘ž โˆ’ ฮ”w๐‘—๐‘˜ ๐ต โˆ’ ๐œ‡๐ธ +

โˆ’ฮ”๐œ‡๐‘ž๐‘ฃ๐‘ž โˆ˜ ๐‘’ + ๐น๐‘— ๐‘ฃ๐‘ž๐‘˜ โˆ’ ๐œ‡๐‘ž๐‘ฃ๐‘ž๐‘— + ๐น

๐‘˜ ๐‘ฃ๐‘ž๐‘— โˆ’ ๐œ‡๐‘ž๐‘ฃ๐‘ž๐‘˜

Uncertainty and Robustness of Graph Embeddings - Bojchevski 61

slide-62
SLIDE 62

Targeted attack: Link prediction

Uncertainty and Robustness of Graph Embeddings - Bojchevski 62

slide-63
SLIDE 63

Targeted attack: Node classification

Uncertainty and Robustness of Graph Embeddings - Bojchevski 63

Before attack Degree attack Random attack Our attack

slide-64
SLIDE 64

Analysis of adversarial edges

Uncertainty and Robustness of Graph Embeddings - Bojchevski 64

slide-65
SLIDE 65

Transferability

DW (SVD) DW (SGNS) n2v

  • Spect. Embd. Label Prop.

GCN ๐‘” = 250 (0.8%)

  • 3.59
  • 2.37
  • 2.04
  • 2.11
  • 5.78
  • 3.34

๐‘” = 500 (1.6%)

  • 4.62
  • 3.97
  • 3.48
  • 4.57
  • 8.95
  • 2.33

๐‘” = 250 (1.7%)

  • 7.59
  • 5.73
  • 6.45
  • 3.58
  • 4.99
  • 2.21

๐‘” = 500 (3.4%)

  • 9.68
  • 11.47
  • 10.24
  • 4.57
  • 6.27
  • 8.61

Uncertainty and Robustness of Graph Embeddings - Bojchevski 65

slide-66
SLIDE 66

Conclusion

Node embeddings are vulnerable to adversarial attacks Poisoning has negative effect on the embeddings quality and the downstream tasks Attacks are transferable โ€“ they generalize to many models

66 Uncertainty and Robustness of Graph Embeddings - Bojchevski

slide-67
SLIDE 67

Important aspects of graph embeddings

Capturing uncertainty Robustness to noise Robustness to adversarial attacks

Uncertainty and Robustness of Graph Embeddings - Bojchevski 67