graph convolutional network
play

Graph Convolutional Network Heting Gao University of Illinois at - PowerPoint PPT Presentation

Graph Convolutional Network Heting Gao University of Illinois at Urbana-Champaign hgao17@illinois.edu November 10, 2018 Heting Gao (UIUC) Short title November 10, 2018 1 / 27 Overview Graph Convolution 1 Preliminary Graph Fourier


  1. Graph Convolutional Network Heting Gao University of Illinois at Urbana-Champaign hgao17@illinois.edu November 10, 2018 Heting Gao (UIUC) Short title November 10, 2018 1 / 27

  2. Overview Graph Convolution 1 Preliminary Graph Fourier Transform Graph Spectral Filtering Fast Localized Spectral Filtering Convolutional Graph Network Heting Gao (UIUC) Short title November 10, 2018 2 / 27

  3. Preliminary i A connected undirected graph is represented as G = {V , E , W } V is the set of vertices and |V| = N . E is the set of edges. W is the weighted adjacency matrix. W i , j is the weight of the edge e = ( i , j ) connecting vertex i and j . W i , j = 0 if the edge does not exist. If the weight of the graph is not naturally defined, a common way to define the weight is � − [ dist ( i , j )] 2 � � if dist ( i , j ) ≤ κ exp W i , j = 2 θ 0 otherwise for some parameter κ and θ . dist ( i , j ) can be the actual distance on the graph between vertex i and j , or the distance between features of vertex i and j i(IEEE-2012) [David i Shuman] The Emerging Field of Signal Processing on Graphs Heting Gao (UIUC) Short title November 10, 2018 3 / 27

  4. Preliminary A signal or a function on the graph f : V → R can be represented as a vector f ∈ R N . f i = f ( v i ) is the function value on vertex v i ∈ V The Non-Normalized Graph Laplacian L = D − W W is the weight matrix. D is the degree matrix. It is a diagonal matrix and diagonal element is the sum of all the incident edge weights. D i , i = � N j =1 W i , j The Graph Laplacian L is a difference operator. ∀ f ∈ R N , � ( L f )( i ) = W i , j [ f ( i ) − f ( j )] j ∈N i � ( Lf ) i = W i , j ( f i − f j ) j ∈N i where N i denote the set of neighbor nodes of vertex i . Heting Gao (UIUC) Short title November 10, 2018 4 / 27

  5. Laplacian 1-D Laplacian operator ∆ f ( t + h ) − f ( t ) f ′ ( t ) = lim h h → 0 ∂ 2 ∆ f ( t ) = ∂ t 2 f ( t ) ∂ ∂ t f ′ ( t ) = f ′ ( t + h ) − f ′ ( t ) = lim h h → 0 Heting Gao (UIUC) Short title November 10, 2018 5 / 27

  6. Laplacian 1-D discrete Laplacian operator ∆ f ′ [ n ] = f [ n + 1] − f [ n ] f ′ [ n + 1] − f ′ [ n ] ∆ f [ n ] = = ( f [ n + 1] − f [ n ]) − ( f [ n ] − f [ n − 1]) = f [ n + 1] + f [ n − 1] − 2 f [ n ] 2-D discrete Laplacian operator ∆ ∆ f [ n , m ] = f [ n + 1 , m ] + f [ n − 1 , m ] + f [ n , m + 1] + f [ n , m − 1] − 4 f [ n , m ] Heting Gao (UIUC) Short title November 10, 2018 6 / 27

  7. Laplacian The Graph Laplacian L is a discrete Laplaican operator on the graph signals. � W i , j ( f i − f j ) ( Lf ) i = j ∈N i − ∆ f = Lf Heting Gao (UIUC) Short title November 10, 2018 7 / 27

  8. Fourier Transform For a given function f ( t ), its Fourier transform F at a given frequency 2 π k is � F (Ω) = < f ( t ) , e j Ω t > = f ( t ) e − j Ω t dt R The Laplacian of the basis e j Ω t is in form of itself. − ∆ e j Ω t = − ∂ 2 ∂ t 2 e j Ω t = Ω 2 e j Ω t For graph Fourier transform, we also want to find a set of analogous basis. Let u ∈ R N be a basis for graph transform, we want − ∆ u = Lu = λ u This is eigenvalue decomposition Heting Gao (UIUC) Short title November 10, 2018 8 / 27

  9. Graph Fourier Transform Let U = [ u l ] l =1 ,..., N denote the matrix of eigenvectors of L Let Λ = [ λ l ] l =1 ,..., N denote the diagonal matrix of eigenvalues of L For a given signal f , its Fourier transform F ( λ l ) at the given frequency λ l is � N f i u ∗ F ( λ l ) = < f , u l > = l , i i =1 The inverse Fourier transform is then N � f i = F ( λ l ) u l , i l =1 Let F ∈ R N denote the Fourier transform vector of the given graph signal f ∈ R N , we have the following matrix form of Fourier transform. U T f = F f = UF Heting Gao (UIUC) Short title November 10, 2018 9 / 27

  10. Graph Spectral Filtering Let F : R N → R N denote the graph Fourier transform Let F − 1 : R N → R N denote the inverse graph Fourier transform. Let h ∈ R N denote the filter function on the graph. Let H ∈ R N denote the Fourier transform of the filter function. Let y ∈ R N denote the function after filtering on the graph. = h ∗ f y F − 1 [ F ( h ) ⊙ F ( f )] = U [ U T h ⊙ U T f ] = U [ H ⊙ U T f ] =   H ( λ 1 )   ...  U T f = U  H ( λ l ) Heting Gao (UIUC) Short title November 10, 2018 10 / 27

  11. Graph Spectral Filtering Define H ( L ) the spectral filter as   H ( λ 1 )   ...  U T H ( L ) = U  H ( λ l ) The adjustable parameter would be [ H l ] l =1 , 2 ,..., N Let θ = [ H ( λ l )] l =1 , 2 ,..., N . Let g θ ( Λ ) = diag ( θ ) We can define the convolutional layer as y = σ ( U g θ ( Λ ) U T f ) Heting Gao (UIUC) Short title November 10, 2018 11 / 27

  12. Fast Localized Spectral Filtering ii If we define the convolutional layer as y = σ ( U g θ ( Λ ) U T f ) There are however 3 limitations. The convolution is not localized. With arbitrary θ , the signal f can be propagated to any other nodes. θ ∈ R N means that we need N parameter. Eigen decomposition has a computational complexity of O ( N 3 ) and every forward propagation has complexity of O ( N 2 ) ii(NIPS-2016) [Michal Defferrard] Convolutional Neural Networks onGraphs with Fast Localized Spectral Filtering Heting Gao (UIUC) Short title November 10, 2018 12 / 27

  13. Fast Localized Spectral Filtering We can instead define � K θ k Λ k g θ ( Λ ) = k =1 We get the new convolutional layer as σ ( U g θ ( Λ ) U T f ) = y K � θ k Λ k ) U T f ) = σ ( U ( k =1 K � θ k UΛ k U T f ) = σ ( k =1 � K θ k L K f ) = σ ( k =1 Heting Gao (UIUC) Short title November 10, 2018 13 / 27

  14. Fast Localized Spectral Filtering The new definition of a convolutional layer is K � θ k L k f ) y = σ ( k =1 It have three advantage The convolution is localized and is exactly K -hop localized we are using at most K ’s power of L We need only K parameters We do not need to decompose L and the forward propagation can be approximated using Chebyshev polynomials (I do not understand this part but I will still try to describe the steps described in the paper). Heting Gao (UIUC) Short title November 10, 2018 14 / 27

  15. Chebychev Polynomial Chebychev Polynomial Expansion T 0 ( y ) = 1 T 1 ( y ) = y T k ( y ) = 2 yT k − 1 − T k − 2 These polynomials forms an orthogonal basis for x ∈ L 2 ([ − 1 , 1] , dy √ 1 − y 2 ), the Hilbert space of square integrable √ dy functions with respect to the measure 1 − y 2 � � 1 T l ( y ) T m ( y ) δ l , m π/ 2 m , l > 0 � dy = 1 − y 2 π m = l = 0 − 1 Heting Gao (UIUC) Short title November 10, 2018 15 / 27

  16. Chebychev Polynomial In particular, ∀ h ∈ L 2 ([ − 1 , 1] , dy √ 1 − y 2 ), h has the following chebychev polynomial expansion. ∞ � h ( y ) = 1 2 c 0 + c k T k ( y ) k =1 λ max − I ) U T = Since λ ∈ [0 , λ max ] plug in y = U ( 2 Λ 2 L λ max − I , K ∞ � � θ k L k = 1 g θ ( L ) = 2 c 0 + c k T k ( y ) k =1 k =1 where T k ( y ) can be computed recursively as T k ( y ) = 2 yT k − 1 ( y ) + T k − 2 ( y ) Heting Gao (UIUC) Short title November 10, 2018 16 / 27

  17. Chebychev Polynomial Let ¯ f k = T k ( y ) f can be computed recursively as ¯ f k = T k ( y ) f = T k ( y ) f = 2 yT k − 1 ( y ) f + T k − 2 ( y ) f 2 y ¯ f k − 1 + ¯ = f k − 2 2( 2 L − I )¯ f k − 1 + ¯ = f k − 2 λ max The approximated convolutional layer is as � K θ k ¯ y = σ ( g θ ( L ) f ) = σ ( f k ) k =0 with ¯ f 0 = f , and ¯ f 1 = y f = ( 2 L λ max − I ) f Heting Gao (UIUC) Short title November 10, 2018 17 / 27

  18. Convolutional Graph Network iv Instead of using K -hop localized filter, set K = 1, but instead stack multiple layers. Use symmetric normalized Laplacian L sym = D − 1 2 LD − 1 2 = I − D − 1 2 WD − 1 2 iii . The entries of L sym are   1 , i = j   1 L sym √ , i � = j and vertex i and j are connected = i , j  d i d j   0 , otherwise The equation is equivalent to � f j 1 j ∈N i W i , j ( f i √ ( L sym f ) i = √ d i − d j ) √ d i The eigenvalues [ λ l ] l =1 , 2 ,..., N of L sym is in the range of [0 , 2] iiiKipf’s paper uses A to represent the weight matrix. I will stick to W to be consistent in this presentation iv(ICLR-2017) [Thomas N. Kipf] Semi-Supervised Classification with Graph Convolutional Networks Heting Gao (UIUC) Short title November 10, 2018 18 / 27

  19. Convolutional Graph Network Then the convolutional layer can be approximated as = σ ( g θ ( L ) f ) y ≈ σ ( θ 0 f + θ 1 y f ) σ ( θ 0 f + θ 1 (2 L sym = − I ) f ) λ max σ [( θ 0 − θ 1 ) f + θ 1 L sym f ] (assume λ max = 2) = σ [( θ 0 − θ 1 ) f + θ 1 ( I − D − 1 2 WD − 1 2 ) f ] = σ ( θ 0 f − θ 1 D − 1 2 WD − 1 2 ) f ) = Heting Gao (UIUC) Short title November 10, 2018 19 / 27

  20. Convolutional Graph Network The approximated output layer y = σ ( θ 0 f − θ 1 D − 1 2 WD − 1 2 ) f ) The number of parameters is further reduced to 1 in the paper by assuming θ = θ 0 = − θ 1 y = σ ( θ ( I + D − 1 2 WD − 1 2 ) f ) The matrix I + D − 1 2 WD − 1 2 has eigenvalues λ ∈ [0 , 2]. Repeated application on this matrix can result in numerical instability. Renormalize I + D − 1 2 WD − 1 D − 1 D − 1 2 to � 2 � W � 2 � W = W + I � � � D i , i = W i , j j Heting Gao (UIUC) Short title November 10, 2018 20 / 27

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend