graph neural networks
play

Graph Neural Networks Prof. Srijan Kumar - PowerPoint PPT Presentation

CSE 6240: Web Search and Text Mining. Spring 2020 Graph Neural Networks Prof. Srijan Kumar http://cc.gatech.edu/~srijan 1 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining Todays Lecture Introduction to deep


  1. CSE 6240: Web Search and Text Mining. Spring 2020 Graph Neural Networks Prof. Srijan Kumar http://cc.gatech.edu/~srijan 1 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  2. Today’s Lecture • Introduction to deep graph embeddings • Graph convolution networks • GraphSAGE 2 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  3. Goal: Node Embeddings Goal: similarity( u, v ) ≈ z > v z u Need to define! d-dimensional Input network embedding space 3 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  4. Deep Graph Encoders • Encoder: Map a node to a low-dimensional vector: enc ( v ) = z v • Deep encoder methods based on graph neural networks: multiple layers of enc ( v ) = non-linear transformations of graph structure • Graph encoders idea is inspired by CNN on (Animation Vincent Dumoul Image Graph images 4 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  5. Idea from Convolutional Networks • In CNN, pixel representation is created by transforming neighboring pixel representation – In GNN, node representations are created by transforming neighboring node representation • But graphs are irregular , unlike images – So, generalize convolutions beyond simple lattices, and leverage node features/attributes • Solution: deep graph encoders 5 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  6. Deep Graph Encoders • Once an encoder is defined, multiple layers of encoders can be stacked … Output: Node embeddings, embed larger network structures, subgraphs, graphs 6 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  7. Graph Encoder: A Naïve Approach • Join adjacency matrix and features • Feed them into a deep neural network: • Done? A B C D E Feat A 0 1 1 1 0 1 0 ? A B B 1 0 0 1 1 0 0 E C 1 0 0 1 0 0 1 C D D 1 1 1 0 1 1 1 E 0 1 0 1 0 1 0 • Issues with this idea: – 𝑃(|𝑊|) parameters – Not applicable to graphs of different sizes – Not invariant to node ordering 7 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  8. Graph Encoders: Two Instantiations 1. Graph convolution networks (GCN): one of the first frameworks to learn node embeddings in an end-to-end manner Different from random walk methods, which are – not end-to-end 2. GraphSAGE: generalized GCNs to various neighborhood aggregations 8 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  9. Today’s Lecture • Introduction to deep graph embeddings • Graph convolution networks (GCN) • GraphSAGE Main paper: “Semi-Supervised Classification with Graph Convolutional Networks”, Kipf and Welling, ICLR 2017 9 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  10. Content • Local network neighborhoods: – Describe aggregation strategies – Define computation graphs • Stacking multiple layers: – Describe the model, parameters, training – How to fit the model? – Simple example for unsupervised and supervised training 10 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  11. Setup • Assume we have a graph 𝐻 : – 𝑊 is the vertex set – 𝑩 is the adjacency matrix (assume binary) – 𝒀 ∈ ℝ +×|-| is a matrix of node features – Social networks: User profile, User image – Biological networks: Gene expression profiles – If there are no features, use: » Indicator vectors (one-hot encoding of a node) » Vector of constant 1: [1, 1, …, 1] 11 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  12. Graph Convolutional Networks • Idea: Generate node embeddings based on local network neighborhoods – A node’s neighborhood defines its computation graph • Learn how to aggregate information from the neighborhood to learn node embeddings – Transform information from the neighbors and combine it: Transform “messages” ℎ / from neighbors: 𝑋 / ℎ / • 12 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  13. Idea: Aggregate Neighbors • Intuition: Generate node embeddings based on local network neighborhoods • Nodes aggregate information from their neighbors using neural networks A C TARGET NODE B B A A C B C A E F D F E D A INPUT GRAPH Neural networks 13 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  14. Idea: Aggregate Neighbors • Intuition: Network neighborhood defines a computation graph Every node defines a computation graph based on its neighborhood 14 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  15. Deep Model: Many Layers • Model can be of arbitrary depth: – Nodes have embeddings at each layer – Layer-0 embedding of node 𝒗 is its input feature, 𝒚 𝒗 – Layer-K embedding gets information from nodes that are atmost K hops away Layer-0 Layer-1 x A A x C C TARGET NODE B B Layer-2 x A A A x B C B C A x E E F D x F F E D A x A INPUT GRAPH 15 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  16. Neighborhood Aggregation • Neighborhood aggregation: Key distinctions are in how different approaches aggregate information across the layers A ? C TARGET NODE B B What is in the box? A A C B ? ? C A E F D F E ? D INPUT GRAPH A 16 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  17. Neighborhood Aggregation • Basic approach: Average information from neighbors and apply a neural network (1) average messages A from neighbors C TARGET NODE B B A A C B C A E F D F E D INPUT GRAPH A (2) apply neural network 17 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  18. The Math: Deep Encoder • Basic approach: Average neighbor messages and apply a neural network – Note: Apply L2 normalization for each node embedding at every layer Previous layer Initial 0-th layer embeddings are h 0 v = x v embedding of v equal to node features 0 1 h k − 1 X | N ( v ) | + B k h k − 1 h k u A , ∀ k ∈ { 1 , ..., K } v = σ @ W k v u ∈ N ( v ) Average of neighbor’s z v = h K v previous layer embeddings Non-linearity Embedding after K layers of (e.g., ReLU) neighborhood aggregation 18 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  19. GCN: Matrix Form • H (l) is the representation in l th layer (l) and W 1 (l) are matrices to be learned for • W 0 each layer • A = adjacency matrix, D = diagonal degree matrix • GCN rewritten in vector form: 19 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  20. Training the Model • How do we train the model? – Need to define a loss function on the embeddings 𝒜 5 20 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  21. Model Parameters • We can feed these embeddings into any loss function and run stochastic gradient descent to train the weight parameters – Once we have the weight matrices, we can calculate the node embeddings Trainable weight matrices h 0 v = x v (i.e., what we learn) 0 1 h k − 1 X | N ( v ) | + B k h k − 1 u h k A , ∀ k ∈ { 1 , ..., K } v = σ @ W k v u ∈ N ( v ) z v = h K v 21 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  22. Unsupervised Training • Training can be unsupervised or supervised • Unsupervised training: – Use only the graph structure: “Similar” nodes have similar embeddings – Common unsupervised loss function = edge existence • Unsupervised loss function can be anything from the last section, e.g., a loss based on – Node proximity in the graph – Random walks 22 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  23. Supervised Training • Train the model for a supervised task (e.g., node E.g., Normal or anomalous node? classification) • Two ways: – Total loss = supervised loss – Total loss = supervised loss + unsupervised loss 23 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  24. Model Design: Overview (1) Define a neighborhood aggregation function 𝒜 5 (2) Define a loss function on the embeddings 24 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  25. Model Design: Overview (3) Train on a set of nodes, i.e., a batch of compute graphs 25 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  26. Model Design: Overview (4) Generate embeddings for nodes as needed Even for nodes we never trained on! 26 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  27. GCN: Inductive Capability • The same aggregation parameters are shared for all nodes: – The number of model parameters is sublinear in |𝑊| and we can generalize to unseen nodes shared parameters B A W k B k C F shared parameters D E INPUT GRAPH Compute graph for node A Compute graph for node B 27 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend