convolutional kernel networks for graph structured data
play

Convolutional Kernel Networks for Graph-Structured Data Dexiong Chen - PowerPoint PPT Presentation

Convolutional Kernel Networks for Graph-Structured Data Dexiong Chen 1 Laurent Jacob 2 Julien Mairal 1 1 Inria Grenoble 2 CNRS/LBBE Lyon ICML 2020 Dexiong Chen Graph Convolutional Kernel Networks 1 / 15 Graph-structured data are ubiquitous (a)


  1. Convolutional Kernel Networks for Graph-Structured Data Dexiong Chen 1 Laurent Jacob 2 Julien Mairal 1 1 Inria Grenoble 2 CNRS/LBBE Lyon ICML 2020 Dexiong Chen Graph Convolutional Kernel Networks 1 / 15

  2. Graph-structured data are ubiquitous (a) molecules (b) protein regulation (c) social networks (d) chemical pathways Dexiong Chen Graph Convolutional Kernel Networks 2 / 15

  3. Learning graph representations State-of-the-art models for representing graphs Deep learning for graphs : graph neural networks (GNNs) Graph kernels : Weisfeiler-Lehman (WL) graph kernels Hybrid models attempt to bridge both worlds: graph neural tangent kernels Dexiong Chen Graph Convolutional Kernel Networks 3 / 15

  4. Learning graph representations State-of-the-art models for representing graphs Deep learning for graphs : graph neural networks (GNNs) Graph kernels : Weisfeiler-Lehman (WL) graph kernels Hybrid models attempt to bridge both worlds: graph neural tangent kernels Our model : A new type of multilayer graph kernel: more expressive than WL kernels Learning easy-to-regularize and scalable unsupervised graph representations Learning supervised graph representations like GNNs Dexiong Chen Graph Convolutional Kernel Networks 3 / 15

  5. Graphs with node attributes G = ( V , E , a : V → R 3 ) u a ( u ) = [0 . 3 , 0 . 8 , 0 . 5] A graph is defined as a triplet ( V , E , a ) ; V and E correspond to the set of vertices and edges; a : V → R d is a function assigning attributes to each node. Dexiong Chen Graph Convolutional Kernel Networks 4 / 15

  6. Graph kernel mappings φ H X Map each graph G in X to a vector ϕ ( G ) in H , which lends itself to learning tasks. [Lei et al., 2017, Kriege et al., 2019] Dexiong Chen Graph Convolutional Kernel Networks 5 / 15

  7. Graph kernel mappings φ H X Map each graph G in X to a vector ϕ ( G ) in H , which lends itself to learning tasks. A large class of graph kernel mappings can be written in the form � ϕ ( G ) := ϕ base ( ℓ G ( u )) where ϕ base embeds some local patterns ℓ G ( u ) to H . u ∈V [Lei et al., 2017, Kriege et al., 2019] Dexiong Chen Graph Convolutional Kernel Networks 5 / 15

  8. Basic kernels: walk and path kernel mappings P k ( G , u ) := paths of length k from node u in G . The k -path mapping is � ϕ path ( u ) := δ a ( p ) ( · ) p ∈P k ( G , u ) a ( p ) : concatenated attributes in p ; δ : the Dirac function. ϕ path ( u ) can be interpreted as a histogram of paths occurrences. Dexiong Chen Graph Convolutional Kernel Networks 6 / 15

  9. Basic kernels: walk and path kernel mappings P k ( G , u ) := paths of length k from node u in G . The k -path mapping is � ϕ path ( u ) := δ a ( p ) ( · ) p ∈P k ( G , u ) a ( p ) : concatenated attributes in p ; δ : the Dirac function. ϕ path ( u ) can be interpreted as a histogram of paths occurrences. Path kernels are more expressive than walk kernels, but less preferred for computational reasons. Dexiong Chen Graph Convolutional Kernel Networks 6 / 15

  10. A relaxed path kernel � ϕ path ( u ) = δ a ( p ) ( · ) p ∈P k ( G , u ) Issues of the path kernel mapping: δ allows hard comparison between paths thus only works for discrete attributes. δ is not differentiable, which cannot be “optimized” with back-propagation. Dexiong Chen Graph Convolutional Kernel Networks 7 / 15

  11. A relaxed path kernel � ϕ path ( u ) = δ a ( p ) ( · ) p ∈P k ( G , u ) 2 � a ( p ) −·� 2 . � e − α = ⇒ p ∈P k ( G , u ) Issues of the path kernel mapping: δ allows hard comparison between paths thus only works for discrete attributes. δ is not differentiable, which cannot be “optimized” with back-propagation. Relax it with a “soft” and differentiable mapping interpreted as the sum of Gaussians centered at each path features from u . Dexiong Chen Graph Convolutional Kernel Networks 7 / 15

  12. One-layer GCKN: a closer look on the relaxed path kernel We define the one-layer GCKN as the relaxed path kernel mapping 2 � a ( p ) −·� 2 = e − α 1 � � Φ 1 ( a ( p )) ∈ H 1 . ϕ 1 ( u ) := p ∈P k ( G , u ) p ∈P k ( G , u ) This formula can be divided into 3 steps : path extraction: enumerating all P k ( G , u ) kernel mapping: evaluating Gaussian embedding Φ 1 of path features path aggregation: aggregating the path embeddings Dexiong Chen Graph Convolutional Kernel Networks 8 / 15

  13. One-layer GCKN: a closer look on the relaxed path kernel We define the one-layer GCKN as the relaxed path kernel mapping 2 � a ( p ) −·� 2 = e − α 1 � � Φ 1 ( a ( p )) ∈ H 1 . ϕ 1 ( u ) := p ∈P k ( G , u ) p ∈P k ( G , u ) This formula can be divided into 3 steps : path extraction: enumerating all P k ( G , u ) kernel mapping: evaluating Gaussian embedding Φ 1 of path features path aggregation: aggregating the path embeddings We obtain a new graph with the same topology but different features ϕ path ( V , E , a ) − − − → ( V , E , ϕ 1 ) Dexiong Chen Graph Convolutional Kernel Networks 8 / 15

  14. Construction of one-layer GCKN ( V , E , ϕ 1 : V → H 1 ) ϕ 1 ( u ) ∈ H 1 ϕ 1 ( u ) := Φ 1 ( a ( p 1 )) + Φ 1 ( a ( p 2 )) + Φ 1 ( a ( p 3 )) u path aggregation path aggregation kernel mapping u Φ 1 ( a ( p 3 )) Φ 1 ( a ( p 2 )) path extraction Φ 1 ( a ( p 1 )) H 1 a ( u ) ∈ R d kernel mapping u p 1 p 2 p 3 u u u ( V , E , a : V → R d ) Dexiong Chen Graph Convolutional Kernel Networks 9 / 15

  15. From one-layer to multilayer GCKN We can repeat applying ϕ path to the new graph ϕ path ϕ path ϕ path ϕ path ( V , E , a ) − − − → ( V , E , ϕ 1 ) → ( V , E , ϕ 2 ) → . . . → ( V , E , ϕ j ) . − − − − − − − − − ϕ j ( u ) represents the information about a neighborhood of u . Final graph representation at layer j , ϕ j ( G ) = � u ∈V ϕ j ( u ) . Dexiong Chen Graph Convolutional Kernel Networks 10 / 15

  16. From one-layer to multilayer GCKN We can repeat applying ϕ path to the new graph ϕ path ϕ path ϕ path ϕ path ( V , E , a ) − − − → ( V , E , ϕ 1 ) → ( V , E , ϕ 2 ) → . . . → ( V , E , ϕ j ) . − − − − − − − − − ϕ j ( u ) represents the information about a neighborhood of u . Final graph representation at layer j , ϕ j ( G ) = � u ∈V ϕ j ( u ) . Why is the multilayer model interesting ? applying ϕ path once can capture paths : GCKN-path; applying twice can capture subtrees : GCKN-subtree; so applying even more times may capture higher-order structures ? Long paths cannot be enumerated due to computational complexity, yet multilayer model can capture long-range substructures . Dexiong Chen Graph Convolutional Kernel Networks 10 / 15

  17. Scalable approximation of Gaussian kernel mapping � ϕ path ( u ) = Φ( a ( p )) p ∈P k ( G , u ) 2 � x −·� 2 ∈ H is infinite-dimensional and can be expensive to compute. Φ( x ) = e − α [Chen et al., 2019a,b] Dexiong Chen Graph Convolutional Kernel Networks 11 / 15

  18. Scalable approximation of Gaussian kernel mapping � ϕ path ( u ) = Φ( a ( p )) p ∈P k ( G , u ) 2 � x −·� 2 ∈ H is infinite-dimensional and can be expensive to compute. Φ( x ) = e − α Nyström provides a finite-dimensional approximation Ψ( x ) ∈ R q by orthogonally projecting Φ( x ) onto some finite-dimensional subspace: span (Φ( z 1 ) , . . . , Φ( z q )) parametrized by Z = { z 1 , . . . , z q } , where z j ∈ R dk can be interpreted as path features. [Chen et al., 2019a,b] Dexiong Chen Graph Convolutional Kernel Networks 11 / 15

  19. Scalable approximation of Gaussian kernel mapping � ϕ path ( u ) = Φ( a ( p )) p ∈P k ( G , u ) 2 � x −·� 2 ∈ H is infinite-dimensional and can be expensive to compute. Φ( x ) = e − α Nyström provides a finite-dimensional approximation Ψ( x ) ∈ R q by orthogonally projecting Φ( x ) onto some finite-dimensional subspace: span (Φ( z 1 ) , . . . , Φ( z q )) parametrized by Z = { z 1 , . . . , z q } , where z j ∈ R dk can be interpreted as path features. The parameters Z can be learned by (unsupervised) K-means on the set of path features; (supervised) end-to-end learning with back-propagation. [Chen et al., 2019a,b] Dexiong Chen Graph Convolutional Kernel Networks 11 / 15

  20. Experiments on graphs with discrete attributes MUTAG 12 10 Accuracy improvement with COLLAB PROTEINS respect to the WL subtree 0 kernel. GCKN-path already -10 outperforms the baselines. IMDB-M PTC Increasing number of layers brings larger improvement. Supervised learning does not improve performance, but WL subtree IMDB-B NCI1 GNTK leads to more compact GCN GIN representations. GCKN-path-unsup GCKN-subtree-unsup GCKN-subtree-sup [Shervashidze et al., 2011, Du et al., 2019, Xu et al., 2019, Kipf and Welling, 2017] Dexiong Chen Graph Convolutional Kernel Networks 12 / 15

  21. Experiments on graphs with continuous attributes ENZYMES 5 0 Accuracy improvement with respect to the WWL kernel. -5 COX2 PROTEINS Results similar to discrete case. Path features seem presumably predictive enough. WWL GNTK BZR GCKN-path-unsup GCKN-subtree-unsup GCKN-subtree-sup [Du et al., 2019, Togninalli et al., 2019] Dexiong Chen Graph Convolutional Kernel Networks 13 / 15

  22. Model interpretation for mutagenicity prediction Idea: find the minimal connected component that preserves the prediction. Original GCKN [Ying et al., 2019] Dexiong Chen Graph Convolutional Kernel Networks 14 / 15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend