Qualifying Oral Exam: Representation Learning on Graphs Pengyu - PowerPoint PPT Presentation

Qualifying Oral Exam: Representation Learning on Graphs Pengyu Cheng Duke University April 5, 2020 Pengyu Cheng (Duke) Qualifying Exam April 5, 2020 1 / 26

Overview Representation learning is an important task in machine learning. Learning embeddings for images, videos, and other data with regular grid shapes has been well-studied. There are tremendous real-world data with non-regular shapes, e.g. social networks, 3D point clouds, and knowledge graphs. Graph is an effective mathematical tool to describe non-regular data. Three reviewed papers are fundamental work in deep graph representation learning. Pengyu Cheng (Duke) Qualifying Exam April 5, 2020 2 / 26

Overview Convolutional Neural Networks on Graphs with Fast Localized 1 Spectral Filtering [Defferrard et al., 2016] Convolutional Networks on Graphs Pooling on Graph Signal Numerical Experiments Discussion and Future Work Semi-supervised Classification with Graph Convolutional 2 Networks [Kipf and Welling, 2016] Introduction Approximation of convolutions on Graphs Experiments Discussion and Future Work Inductive Representation learning on Large Graphs [Hamilton 3 et al., 2017] Introduction Proposed Method: GraphSAGE Experiments Pengyu Cheng (Duke) Qualifying Exam April 5, 2020 3 / 26 Discussion and Future Work

Problem Description Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering [Defferrard et al., 2016] Convolutional neural network(CNN) is an important technique to learn meaningful local patterns. CNNs are widely used on images, voices, videos, and other data with regular grid shapes. However, CNNs are inapplicable to non-Euclidean data. This paper give a solution of generalizing the convolution and the pooling operations of CNNs on graphs. Pengyu Cheng (Duke) Qualifying Exam April 5, 2020 4 / 26

Preliminary Let G = ( V , E , W ) be a undirected graph with node set V , n = |V| , and edge set E . W ∈ R n × n is a weighted adjacency matrix. The graph Laplacian is L = D − W , with normalized version as L = I n − D − 1 / 2 WD − 1 / 2 , where D is diagonal degree matrix, and I n is the identity matrix. L is symmetric positive semi-definite, L = U Λ U T , where Λ = diag([ λ 0 , . . . , λ n − 1 ]) are eigenvalues and U = [ u 0 , . . . , u n − 1 ] are eigenvectors. Pengyu Cheng (Duke) Qualifying Exam April 5, 2020 5 / 26

Preliminary Suppose x = [ x 1 , . . . x n ] T ∈ R n is a graph signal, where x i is corresponding to node v i ∈ V . x = U T x . The graph Fourier transformation for x is ˆ By UU T = I n , the inverse graph Fourier transformation x = U ˆ x . For classic Fourier transform, convolution in signal domain equals to point-wise multiplication in spectral domain then transforming back. Definition of graph convolution ∗ G , x ∗ G y = U (( U T x ) ⊙ ( U T y )) = U [diag( U T x )] U T y , (1) with ⊙ being point-wise multiplication. Pengyu Cheng (Duke) Qualifying Exam April 5, 2020 6 / 26

Non-parametric Convolution Filters Considering a graph convolutional filter in spectral domain, with parameters θ ∈ R n as g θ ( L ) = diag( θ ) . Then the convolution between signal x with filter g θ is written as g θ ∗ G x = U g θ ( L ) U T x = U diag( θ ) U T x . (2) This non-parametric filter has two disadvantages: (1) could not ensure extracting information in a local ( i.e. information from a node with its close neighbors); (2) its parameter size is O ( n ), increasing with the node number. Pengyu Cheng (Duke) Qualifying Exam April 5, 2020 7 / 26

Polynomial Parametrization To solve those problems, the authors propose parameterized filters: K − 1 � θ k Λ k , g θ ( L ) = (3) k =0 The convolution for a signal x is K − 1 K − 1 � � θ k Λ k ] U T x = θ k L k x . g θ ∗ G x = U [ (4) k =0 k =0 Hammond et al. [2011] show that if d G ( i , j ) > K , then [ L K ] ij = 0, where d G is the length of the shortest path from v i to v j on graph G . Therefore, each node is only involved with neighbors whose distance to it is smaller than K . Also, learning complexity of g θ becomes O ( K ), as a constant to the node size n . Pengyu Cheng (Duke) Qualifying Exam April 5, 2020 8 / 26

Pooling on Graph Signal Based on idea: similar vertices are supposed to be clustered together. Graclus multi-level clustering: at each level G h : (1) randomly selects a unmarked node; (2) matches the node to a unmarked neighbor maximizing normalized edge cut W ij (1 / d i + 1 / d j ); (3) marks the matched two nodes. The operation is repeated until all nodes becomes marked. Pengyu Cheng (Duke) Qualifying Exam April 5, 2020 9 / 26

Pooling on Graph Signal Pooling operation: frequently applied during training → large computational complexity. Efficient solution: record the pooling assignments before training. Build a binary tree to record node matching assignments: (1) If v ( h ) , v ( h ) ∈ G h are pooled to v ( h +1) ∈ G h +1 , i j l store v ( h ) , v ( h ) as children of v ( h +1) on binary tree. i j l (2) Assign fake nodes to not matched nodes. Pengyu Cheng (Duke) Qualifying Exam April 5, 2020 10 / 26

Experiments Comparison with original CNNs on MNIST. Converting images to graph: represent each pixel by a node; connect to 8 nearest neighbors. The weighted adjacency matrix W is defined as [ W ] ij = exp( −� z i − z j � 2 2 ) , (5) σ 2 where z i is the pixel value of the i -th pixel. Pengyu Cheng (Duke) Qualifying Exam April 5, 2020 11 / 26

Experiments Besides, the graph CNNs have rotational invariance which CNNs for regular grids do not have. Apply to Text classification: The 20News dataset contains 18846 documents with 20 class labels. represent each document x as a graph. Each word is a node and nodes are connected to 16 nearest neighbor based on the similarity of their Word2Vec embeddings. Pengyu Cheng (Duke) Qualifying Exam April 5, 2020 12 / 26

Discussion and Future Work Some directions to improve the proposed model. The pooling requires a weighted adjacency matrix as a measurement to pair nodes, W ij (1 / d i + 1 / d j ). However, a large number of graphs do not have this additional information. To record the pooling assignment, the model builds a binary tree. When new graphs come or the structures of graphs change, the model need to rebuild the binary tree, which leads to high computational complexity. Therefore, how to efficiently operate pooling on graphs still remains an interesting problem. Pengyu Cheng (Duke) Qualifying Exam April 5, 2020 13 / 26

Introduction Semi-supervised Classification with Graph Convolutional Networks [Kipf and Welling, 2016] In this paper, the authors simplify the graph convolution with a first-order approximation called Graph Convolutional Network(GCN). Then the new method shows effective experiment results on semi-supervised node classification tasks. Recall the convolution filter g θ ( L ) = � K − 1 k =0 θ k Λ k . The Convolution layer K − 1 K − 1 � � θ k Λ k ] U T x = θ k L k x . g θ ∗ G x = U [ k =0 k =0 Simplify the convolution with the polynomial order K = 1. Pengyu Cheng (Duke) Qualifying Exam April 5, 2020 14 / 26

Convolution Approximation Three explanations to approximation: stacking multiple convolutional layers with K = 1 can reach the similar performance as high-order convolutions low-order convolution reduce the over-fitting when applying to graphs with large-range node degree distributions with a limited computational ability, the K = 1 approximation allows deeper models, improving modeling capacity. replace weighted adjacency matrix W with adjacency matrix A , g θ ∗ G x = θ 1 Lx + θ 0 x = θ ′ 0 x − θ ′ 1 D − 1 / 2 AD − 1 / 2 x , (6) The second approximation: let θ = θ ′ 0 = − θ ′ 1 , I n + D − 1 / 2 AD − 1 / 2 � � g θ ∗ G x = θ x . (7) Pengyu Cheng (Duke) Qualifying Exam April 5, 2020 15 / 26

Convolution Approximation To increase the numerical stability, the authors introduce the re-normalization trick I n + D − 1 / 2 AD − 1 / 2 → ˜ D − 1 / 2 ˜ A ˜ D − 1 / 2 , where ˜ A = A + I n and ˜ D = D + I n . Convolution for signals with multiple channels: X ∈ R n × c and outputs Z ∈ R n × f , generalize Eq. (7) with parameter Θ D − 1 / 2 ˜ Z = ˜ A ˜ D − 1 / 2 X Θ (8) The authors use Eq.(7) to solve the semi-supervised node classification problem. The model is a two-layer GCN: � � AXW (0) � W (1) � ˆ ˆ Z = f ( X , A ) = softmax A ReLU , (9) D − 1 / 2 ˜ where ˆ A = ˜ A ˜ D − 1 / 2 . � F The loss function L = − � f =1 Y lf log Z lf where Y L is the l ∈Y L labeled node set, and each Y l is a one-hot node label for l -th node. Pengyu Cheng (Duke) Qualifying Exam April 5, 2020 16 / 26

Experiments The model is trained with full gradient descent. The authors conduct the semi-supervised node classification on citation networks and knowledge graphs: Instead of fixing the labeled node set, the authors also provide results that randomly select the labeled node set (rand.splits). Pengyu Cheng (Duke) Qualifying Exam April 5, 2020 17 / 26

Experiments Besides, the authors study the performance of different convolution approximations and report the mean classification accuracy on citation networks. From the table 1, the original GCN (re-normalization trick) shows the best performance. Figure: Comparison of different propagation models Pengyu Cheng (Duke) Qualifying Exam April 5, 2020 18 / 26

Qualifying Oral Exam: Representation Learning on Graphs Pengyu - PowerPoint PPT Presentation

Qualifying Oral Exam: Representation Learning on Graphs Pengyu Cheng Duke University April 5, 2020 Pengyu Cheng (Duke) Qualifying Exam April 5, 2020 1 / 26 Overview Representation learning is an important task in machine learning.

Oral Health Oral Health National University of Ireland, Cork Oral Health Oral Health Oral

Graphs Graphs Examples Definitions Implementation/Representation of graphs Graphs

Exam4 Information and Guidance General Topics General Exam Information Exam types

Quicksort Sorting Lower Bound Exam Exam Exam Exam 2 2 tomorrow evening 2 2 tomorrow

Algorithms for Lipschitz Learning on Graphs Sushant Sachdeva Yale Institute of Network Sciences

Graphs () Graphs () Graphs Graphs Graphs are collections of nodes

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Examination Lydia Love DVM DACVAA 2018 Exam Committee Chair September 2018 Exam Format

Graphs Graphs Definitions Implementation/Representation of graphs Search Traversing

Week 4 Kullmann Graphs and directed graphs Elementary Graph Algorithms Representing graphs

On some classes of Deza graphs Deza graphs without 3-cocliques Line graphs V.V. Kabanov 1 Deza

Oral Health Screening December 2015 Prepared by: Calgary Zone Community Oral Health Team &

Announcement New exam dates: Graphs Exam 1 Monday, Oct 6 th Exam 2 Monday,

A Compact Representation for Chordal Chordal Graphs Graphs A Compact Representation for Lilian

Graphs Graphs Definitions Implementation/Representation of graphs Search

Graphs Graphs Simple graphs Algorithms Depth-first search Breadth-first search

The Challenges of Importing Textiles and Apparel Rogelio Vazquez Sr. Business Development Mgr.

Culturally Competent Care Learning Collaborative Session 1 1 November 3, 2020 National Center

Towards Accurate Model Selection in Deep Unsupervised Domain Adaptation Kaichao You 1 , Ximei Wang

Full Year 2013 Results 27 February 2014 Key takeaways FY13 operating profit pre-RCR 2.5bn,

Managing Student Debt Heather Jarvis, Presenter Todays Agenda Income-Based Repayment

Women 1115 Waiver July 16, 2020 Healthy Texas Women 1115 Waiver Overview About the

Evaluation of Relational Operations: Other Techniques [R&G] Chapter 14, Part B CS4320 1

Forecasting and Optimization Methods Laura Ramrez Elizondo Learning Objectives What is

Sambuz

Useful Links

Newsletter

Mail Us

Qualifying Oral Exam: Representation Learning on Graphs Pengyu - PowerPoint PPT Presentation

Qualifying Oral Exam: Representation Learning on Graphs Pengyu Cheng Duke University April 5, 2020 Pengyu Cheng (Duke) Qualifying Exam April 5, 2020 1 / 26 Overview Representation learning is an important task in machine learning.

Oral Health Oral Health National University of Ireland, Cork Oral Health Oral Health Oral

Graphs Graphs Examples Definitions Implementation/Representation of graphs Graphs

Exam4 Information and Guidance General Topics General Exam Information Exam types

Quicksort Sorting Lower Bound Exam Exam Exam Exam 2 2 tomorrow evening 2 2 tomorrow

Algorithms for Lipschitz Learning on Graphs Sushant Sachdeva Yale Institute of Network Sciences

Graphs () Graphs () Graphs Graphs Graphs are collections of nodes

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Examination Lydia Love DVM DACVAA 2018 Exam Committee Chair September 2018 Exam Format

Graphs Graphs Definitions Implementation/Representation of graphs Search Traversing

Week 4 Kullmann Graphs and directed graphs Elementary Graph Algorithms Representing graphs

On some classes of Deza graphs Deza graphs without 3-cocliques Line graphs V.V. Kabanov 1 Deza

Oral Health Screening December 2015 Prepared by: Calgary Zone Community Oral Health Team &amp;

Announcement New exam dates: Graphs Exam 1 Monday, Oct 6 th Exam 2 Monday,

A Compact Representation for Chordal Chordal Graphs Graphs A Compact Representation for Lilian

Graphs Graphs Definitions Implementation/Representation of graphs Search

Graphs Graphs Simple graphs Algorithms Depth-first search Breadth-first search

The Challenges of Importing Textiles and Apparel Rogelio Vazquez Sr. Business Development Mgr.

Culturally Competent Care Learning Collaborative Session 1 1 November 3, 2020 National Center

Towards Accurate Model Selection in Deep Unsupervised Domain Adaptation Kaichao You 1 , Ximei Wang

Full Year 2013 Results 27 February 2014 Key takeaways FY13 operating profit pre-RCR 2.5bn,

Managing Student Debt Heather Jarvis, Presenter Todays Agenda Income-Based Repayment

Women 1115 Waiver July 16, 2020 Healthy Texas Women 1115 Waiver Overview About the

Evaluation of Relational Operations: Other Techniques [R&amp;G] Chapter 14, Part B CS4320 1

Forecasting and Optimization Methods Laura Ramrez Elizondo Learning Objectives What is

Sambuz

Useful Links

Newsletter

Mail Us

Oral Health Screening December 2015 Prepared by: Calgary Zone Community Oral Health Team &

Evaluation of Relational Operations: Other Techniques [R&G] Chapter 14, Part B CS4320 1