Constant Curvature Graph Convolutional Networks Gregor Gary - - PowerPoint PPT Presentation

constant curvature graph convolutional networks
SMART_READER_LITE
LIVE PREVIEW

Constant Curvature Graph Convolutional Networks Gregor Gary - - PowerPoint PPT Presentation

Constant Curvature Graph Convolutional Networks Gregor Gary Octavian Bachmann B ecigneul Ganea ETH Z urich MIT MIT Overview Overview Embeddings of graphs into hyperbolic and spherical space and their products Overview


slide-1
SLIDE 1

Constant Curvature Graph Convolutional Networks

Gregor Bachmann ETH Z¨ urich Gary B´ ecigneul MIT Octavian Ganea MIT

slide-2
SLIDE 2

Overview

slide-3
SLIDE 3

Overview

  • Embeddings of graphs into hyperbolic and spherical space

and their products

slide-4
SLIDE 4

Overview

  • Embeddings of graphs into hyperbolic and spherical space

and their products

  • Extend gyrovector framework to spherical geometry and

provide a unifying formalism

slide-5
SLIDE 5

Overview

  • Embeddings of graphs into hyperbolic and spherical space

and their products

  • Extend gyrovector framework to spherical geometry and

provide a unifying formalism

  • Introduce graph neural networks producing embeddings in

product spaces

slide-6
SLIDE 6

Overview

  • Embeddings of graphs into hyperbolic and spherical space

and their products

  • Extend gyrovector framework to spherical geometry and

provide a unifying formalism

  • Introduce graph neural networks producing embeddings in

product spaces

  • Differentiable transitions in geometry during training in each

component

slide-7
SLIDE 7

Graphs

slide-8
SLIDE 8

Graphs

  • Lots of data available in the form of graphs (social networks,

railway tracks, phylogenetic trees etc.)

slide-9
SLIDE 9

Graphs

  • Lots of data available in the form of graphs (social networks,

railway tracks, phylogenetic trees etc.)

slide-10
SLIDE 10

Graphs

  • Lots of data available in the form of graphs (social networks,

railway tracks, phylogenetic trees etc.)

  • Node set V = {1, . . . , n} and adjacency matrix A ∈ Rn×n
slide-11
SLIDE 11

Where to Embed Graphs?

slide-12
SLIDE 12

Where to Embed Graphs?

  • Euclidean geometry not suitable for many graphs
slide-13
SLIDE 13

Where to Embed Graphs?

  • Euclidean geometry not suitable for many graphs
slide-14
SLIDE 14

Where to Embed Graphs?

  • Euclidean geometry not suitable for many graphs
  • Graph distance dG(i, j) = ”Shortest path from i to j” not

respected in Euclidean embedding

slide-15
SLIDE 15

Where to Embed Graphs?

  • Euclidean geometry not suitable for many graphs
  • Graph distance dG(i, j) = ”Shortest path from i to j” not

respected in Euclidean embedding

  • Arbitrary low distortion in spherical and hyperbolic space
slide-16
SLIDE 16

Non-Euclidean Geometry

slide-17
SLIDE 17

Non-Euclidean Geometry

  • Focus on constant sectional curvature manifolds
slide-18
SLIDE 18

Non-Euclidean Geometry

  • Focus on constant sectional curvature manifolds
  • Well-studied in the field of Differential Geometry
slide-19
SLIDE 19

Non-Euclidean Geometry

  • Focus on constant sectional curvature manifolds
  • Well-studied in the field of Differential Geometry
  • Computationally attractive expressions for distance,

exponential map etc.

slide-20
SLIDE 20

Hyperbolic Space as Poincar´ e Ball

slide-21
SLIDE 21

Hyperbolic Space as Poincar´ e Ball

  • Hn = {x : ||x||2 ≤

1 √c } with curvature −c equipped with

Riemannian tensor gc

x = 4 (1−c||x||2)2 1

slide-22
SLIDE 22

Hyperbolic Space as Poincar´ e Ball

  • Hn = {x : ||x||2 ≤

1 √c } with curvature −c equipped with

Riemannian tensor gc

x = 4 (1−c||x||2)2 1

  • Projection of hyperboloid
slide-23
SLIDE 23

Hyperbolic Space as Poincar´ e Ball

  • Hn = {x : ||x||2 ≤

1 √c } with curvature −c equipped with

Riemannian tensor gc

x = 4 (1−c||x||2)2 1

  • Projection of hyperboloid
  • dc

H (x, y) = 1 √c cosh−1

  • 1 +

2 c ||x−y||2 2

( 1

c −||x||2 2)( 1 c −||y||2 2)

  • Heatmap of dκ

H

Projection of hyperboloid [4]

slide-24
SLIDE 24

Gyrospace Structure

slide-25
SLIDE 25

Gyrospace Structure

  • Next best thing to a vector space
slide-26
SLIDE 26

Gyrospace Structure

  • Next best thing to a vector space
  • Vector addition x + y → x ⊕c y
slide-27
SLIDE 27

Gyrospace Structure

  • Next best thing to a vector space
  • Vector addition x + y → x ⊕c y
  • Scalar multiplication rx → r ⊗c x
slide-28
SLIDE 28

Gyrospace Structure

  • Next best thing to a vector space
  • Vector addition x + y → x ⊕c y
  • Scalar multiplication rx → r ⊗c x
  • Geodesic γx−

→y(t) = x ⊕c (t ⊗c (−x ⊕c y))

slide-29
SLIDE 29

Spherical Space as Stereographic Projection

slide-30
SLIDE 30

Spherical Space as Stereographic Projection

  • Stereographic projection of Sd+1 ∼

= Rd + gc

x where

gc

x = 4 (1+c||x||2)2 1

slide-31
SLIDE 31

Spherical Space as Stereographic Projection

  • Stereographic projection of Sd+1 ∼

= Rd + gc

x where

gc

x = 4 (1+c||x||2)2 1

  • dc

S (x, y) = 1 √c cos−1

  • 1 +

2 c ||x−y||2 2

( 1

c +||x||2 2)( 1 c +||y||2 2)

slide-32
SLIDE 32

Spherical Space as Stereographic Projection

  • Stereographic projection of Sd+1 ∼

= Rd + gc

x where

gc

x = 4 (1+c||x||2)2 1

  • dc

S (x, y) = 1 √c cos−1

  • 1 +

2 c ||x−y||2 2

( 1

c +||x||2 2)( 1 c +||y||2 2)

slide-33
SLIDE 33

Our Contributions: 1) Unified Formalism

slide-34
SLIDE 34

Our Contributions: 1) Unified Formalism

  • κ-stereographic model for any κ ∈ R:

std

κ = {x ∈ Rd | −κx2 2 < 1}

slide-35
SLIDE 35

Our Contributions: 1) Unified Formalism

  • κ-stereographic model for any κ ∈ R:

std

κ = {x ∈ Rd | −κx2 2 < 1}

Rd std

κ

x ⊕κ y x + y

(1−2κxT y−κ||y||2)x+(1+κ||x||2)y 1−2κxT y+κ2||x||2||y||2

r ⊗κ x rx tanκ (r · tan−1

κ ||x||) x ||x||

γx→y(t) x + t(y − x) x ⊕κ (t ⊗κ (−x ⊕κ y))

slide-36
SLIDE 36

Our Contributions: 1) Unified Formalism

  • κ-stereographic model for any κ ∈ R:

std

κ = {x ∈ Rd | −κx2 2 < 1}

Rd std

κ

x ⊕κ y x + y

(1−2κxT y−κ||y||2)x+(1+κ||x||2)y 1−2κxT y+κ2||x||2||y||2

r ⊗κ x rx tanκ (r · tan−1

κ ||x||) x ||x||

γx→y(t) x + t(y − x) x ⊕κ (t ⊗κ (−x ⊕κ y))

  • More unifying expressions for distance, exponential map
  • etc. in our paper!
slide-37
SLIDE 37

Our Contributions: 2) Matrix Multiplications

slide-38
SLIDE 38

Our Contributions: 2) Matrix Multiplications

  • Embeddings X where Xi• ∈ std

κ, W ∈ Rd×k and A ∈ Rn×n

slide-39
SLIDE 39

Our Contributions: 2) Matrix Multiplications

  • Embeddings X where Xi• ∈ std

κ, W ∈ Rd×k and A ∈ Rn×n

  • Right matrix multiplication XW acts on columns X•i

Thus lift to tangent space at zero: (X ⊗κ W )i• = expκ

0 ((logκ 0(X)W )i•)

slide-40
SLIDE 40

Our Contributions: 2) Matrix Multiplications

  • Embeddings X where Xi• ∈ std

κ, W ∈ Rd×k and A ∈ Rn×n

  • Right matrix multiplication XW acts on columns X•i

Thus lift to tangent space at zero: (X ⊗κ W )i• = expκ

0 ((logκ 0(X)W )i•)

  • Introduced in [2], we extended it to spherical spaces
slide-41
SLIDE 41

Our Contributions: 2) Matrix Multiplications

slide-42
SLIDE 42

Our Contributions: 2) Matrix Multiplications

  • Left matrix multiplication AX acts on rows Xi•:

(AX)i• = Ai1X1• + · · · + AinXn•

slide-43
SLIDE 43

Our Contributions: 2) Matrix Multiplications

  • Left matrix multiplication AX acts on rows Xi•:

(AX)i• = Ai1X1• + · · · + AinXn•

  • Idea: Reduce problem of linear combination to definition of

a non-euclidean midpoint

slide-44
SLIDE 44

Our Contributions: 2) Matrix Multiplications

  • Left matrix multiplication AX acts on rows Xi•:

(AX)i• = Ai1X1• + · · · + AinXn•

  • Idea: Reduce problem of linear combination to definition of

a non-euclidean midpoint

slide-45
SLIDE 45

Our Contributions: 2) Matrix Multiplications

slide-46
SLIDE 46

Our Contributions: 2) Matrix Multiplications

  • Leverage gyromidpoint for hyperbolic space and extend it to

std

κ:

mκ(x1, · · · , xn; α) = 1 2 ⊗κ n

  • i=1

αiλκ

xi

n

j=1 αj(λκ xj − 1)xi

slide-47
SLIDE 47

Our Contributions: 2) Matrix Multiplications

  • Leverage gyromidpoint for hyperbolic space and extend it to

std

κ:

mκ(x1, · · · , xn; α) = 1 2 ⊗κ n

  • i=1

αiλκ

xi

n

j=1 αj(λκ xj − 1)xi

  • Define left matrix multiplication row-wise:

(A ⊠κ X)i• := (

  • j

Aij) ⊗κ mκ(X1•, · · · , Xn•; Ai•)

slide-48
SLIDE 48

Our Contributions: 2) Matrix Multiplications

  • Leverage gyromidpoint for hyperbolic space and extend it to

std

κ:

mκ(x1, · · · , xn; α) = 1 2 ⊗κ n

  • i=1

αiλκ

xi

n

j=1 αj(λκ xj − 1)xi

  • Define left matrix multiplication row-wise:

(A ⊠κ X)i• := (

  • j

Aij) ⊗κ mκ(X1•, · · · , Xn•; Ai•)

  • Same scaling behaviour: dκ(0, r ⊗κ x) = r · dκ(0, x)
slide-49
SLIDE 49

Gyromidpoint for Varying Curvature

slide-50
SLIDE 50

Our Contributions: 3) Differentiable Interpolation

slide-51
SLIDE 51

Our Contributions: 3) Differentiable Interpolation

  • All quantities recover their Euclidean counterpart for κ −

→ 0±

slide-52
SLIDE 52

Our Contributions: 3) Differentiable Interpolation

  • All quantities recover their Euclidean counterpart for κ −

→ 0±

  • We proved an even stronger result:
slide-53
SLIDE 53

Our Contributions: 3) Differentiable Interpolation

  • All quantities recover their Euclidean counterpart for κ −

→ 0±

  • We proved an even stronger result:

Differentiability of std

κ w.r.t. κ around 0

The first order derivatives at 0− and 0+ w.r.t. to κ of all the mentioned quantities exist and are equal.

slide-54
SLIDE 54

Our Contributions: 3) Differentiable Interpolation

  • All quantities recover their Euclidean counterpart for κ −

→ 0±

  • We proved an even stronger result:

Differentiability of std

κ w.r.t. κ around 0

The first order derivatives at 0− and 0+ w.r.t. to κ of all the mentioned quantities exist and are equal.

  • Enables learning the curvature κ with gradient descent with

a differentiable change of sign

slide-55
SLIDE 55

Our Contributions: 4) Constant Curvature GCN

slide-56
SLIDE 56

Our Contributions: 4) Constant Curvature GCN

  • Given graph G = (V , A, X) where V = {1, . . . , n}, adjacency

A ∈ Rn×n and node-level features X ∈ Rn×d

slide-57
SLIDE 57

Our Contributions: 4) Constant Curvature GCN

  • Given graph G = (V , A, X) where V = {1, . . . , n}, adjacency

A ∈ Rn×n and node-level features X ∈ Rn×d

  • Graph neural networks are a very popular class of models for

inference on graphs

slide-58
SLIDE 58

Our Contributions: 4) Constant Curvature GCN

  • Given graph G = (V , A, X) where V = {1, . . . , n}, adjacency

A ∈ Rn×n and node-level features X ∈ Rn×d

  • Graph neural networks are a very popular class of models for

inference on graphs

  • We extend the vanilla GCN [3]:

H(t+1) = σ

  • ˆ

AH(t)W (t) for some non-linearity σ, ˆ A = ˜ D− 1

2 (A + 1) ˜

D− 1

2 ,

˜ Dii =

k ˜

Aik and trainable parameters W (l)

slide-59
SLIDE 59

Our Contributions: 4) Constant Curvature GCN

slide-60
SLIDE 60

Our Contributions: 4) Constant Curvature GCN

  • Turn it non-euclidean:

H(l+1) = σ⊗κ ˆ A ⊠κ

  • H(l) ⊗κ W (l)

where σ⊗κ is the κ-stereographic version of σ (see paper)

slide-61
SLIDE 61

Our Contributions: 4) Constant Curvature GCN

  • Turn it non-euclidean:

H(l+1) = σ⊗κ ˆ A ⊠κ

  • H(l) ⊗κ W (l)

where σ⊗κ is the κ-stereographic version of σ (see paper)

  • Learn the curvature to adapt to the geometry of the data
slide-62
SLIDE 62

Our Contributions: 4) Constant Curvature GCN

  • Turn it non-euclidean:

H(l+1) = σ⊗κ ˆ A ⊠κ

  • H(l) ⊗κ W (l)

where σ⊗κ is the κ-stereographic version of σ (see paper)

  • Learn the curvature to adapt to the geometry of the data
  • Allows for differentiable transitions in the geometry during

training

slide-63
SLIDE 63

Our Contributions: 5) Product GCN

slide-64
SLIDE 64

Our Contributions: 5) Product GCN

  • We can take it one step further: Embed in product space

std

κ1 × · · · × std κm

slide-65
SLIDE 65

Our Contributions: 5) Product GCN

  • We can take it one step further: Embed in product space

std

κ1 × · · · × std κm

slide-66
SLIDE 66

Our Contributions: 5) Product GCN

  • We can take it one step further: Embed in product space

std

κ1 × · · · × std κm

  • Again we find a gyrovector space structure
slide-67
SLIDE 67

Our Contributions: 5) Product GCN

  • We can take it one step further: Embed in product space

std

κ1 × · · · × std κm

  • Again we find a gyrovector space structure
  • The operations extend component-wise while still preserving

the desired properties

slide-68
SLIDE 68

Experiments: Distortion Task

slide-69
SLIDE 69

Experiments: Distortion Task

  • Minimize the discrepancy between embedding distances and

graph distances L(x1, . . . , xn) = 1 n2

  • i,j

dκ(xi, xj) dG(i,j) 2 − 1 2

slide-70
SLIDE 70

Experiments: Distortion Task

  • Minimize the discrepancy between embedding distances and

graph distances L(x1, . . . , xn) = 1 n2

  • i,j

dκ(xi, xj) dG(i,j) 2 − 1 2

  • Train κ-GCN on three syntethic datasets, tree (negative

curvature), spherical graph (positive curvature) and toroidal graph (product of positive curvature)

slide-71
SLIDE 71

Experiments: Distortion Task

  • Minimize the discrepancy between embedding distances and

graph distances L(x1, . . . , xn) = 1 n2

  • i,j

dκ(xi, xj) dG(i,j) 2 − 1 2

  • Train κ-GCN on three syntethic datasets, tree (negative

curvature), spherical graph (positive curvature) and toroidal graph (product of positive curvature)

Model Tree Toroidal Spherical E10 (GCN) 0.0502 0.0603 0.0409 H10 (κ-GCN) 0.0029 0.272 0.267 S10 (κ-GCN) 0.473 0.0485 0.0337 H5 × H5 (κ-GCN) 0.0048 0.112 0.152 S5 × S5 (κ-GCN) 0.51 0.0464 0.0359

slide-72
SLIDE 72

Experiments: Node Classification

  • Evaluate on four real-world datasets
slide-73
SLIDE 73

Experiments: Node Classification

  • Evaluate on four real-world datasets
  • Report mean accuracy across 5 splits and 5 runs each
slide-74
SLIDE 74

Experiments: Node Classification

  • Evaluate on four real-world datasets
  • Report mean accuracy across 5 splits and 5 runs each

Model Citeseer Cora Pubmed Airport E16 [3] 72.9 ± 0.54 81.4 ± 0.4 79.2 ± 0.39 81.4 ± 0.29 H16 [1] 71 ± 0.49 80.3 ± 0.46 79.8 ± 0.43 84.4 ± 0.41 H16 (κ-GCN) 73.2 ± 0.51 81.2 ± 0.5 78.5 ± 0.36 81.9 ± 0.33 S16 (κ-GCN) 72.1 ± 0.45 81.9 ± 0.45 78.8 ± 0.49 80.9 ± 0.58 Prod-GCN 71.1 ± 0.59 80.8 ± 0.41 78.1 ± 0.6 81.7 ± 0.44

slide-75
SLIDE 75

THANK YOU!

Check out our website hyperbolicdeeplearning.com

HYPERBOLIC DEEP LEARNING

slide-76
SLIDE 76

References

[1] Chami, I., Ying, R., R´ e, C., and Leskovec, J. (2019). Hyperbolic graph convolutional neural networks. Advances in Neural Information processing systems. [2] Ganea, O., B´ ecigneul, G., and Hofmann, T. (2018). Hyperbolic neural networks. In Advances in Neural Information Processing Systems, pages 5345–5355. [3] Kipf, T. N. and Welling, M. (2017). Semi-supervised classification with graph convolutional networks. International Conference on Learning Representations. [4] Nickel, M. and Kiela, D. (2018). Learning continuous hierarchies in the lorentz model of hyperbolic geometry. In International Conference on Machine Learning.