http cs224w stanford edu
play

http://cs224w.stanford.edu Intuition: Map nodes to -dimensional - PowerPoint PPT Presentation

Project Proposal deadline : tonight, 11:59pm Course Notes: https://snap-stanford.github.io/cs224w-notes/ Help us write the course notes we will give generous bonuses! CS224W: Machine Learning with Graphs Jure Leskovec, Stanford University


  1. Project Proposal deadline : tonight, 11:59pm Course Notes: https://snap-stanford.github.io/cs224w-notes/ Help us write the course notes – we will give generous bonuses! CS224W: Machine Learning with Graphs Jure Leskovec, Stanford University http://cs224w.stanford.edu

  2. ¡ Intuition: Map nodes to 𝑒 -dimensional embeddings such that similar nodes in the graph are embedded close together f( )= 2D node embeddings Input graph How to learn mapping function 𝒈 ? 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 2

  3. ¡ Goal: Map nodes so that similarity in the embedding space (e.g., dot product) approximates similarity (e.g., proximity) in the network d-dimensional Input network embedding space 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 3

  4. Goal: similarity( u, v ) ≈ z > v z u Need to define! d-dimensional Input network embedding space 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 4

  5. ¡ Encoder: Map a node to a low-dimensional vector: d-dimensional embedding enc ( v ) = z v node in the input graph ¡ Similarity function defines how relationships in the input network map to relationships in the embedding space: similarity( u, v ) ≈ z > v z u Similarity of u and v dot product between in the network node embeddings 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 5

  6. ¡ So far we have focused on “shallow” encoders , i.e. embedding lookups: embedding vector for a specific node embedding matrix Dimension/size of Z = embeddings one column per node 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 6

  7. Shallow encoders: § One-layer of data transformation § A single hidden layer maps node 𝑣 to embedding 𝒜 % via function 𝑔 (), e.g. 𝒜 % = 𝑔 𝒜 ( , 𝑤 ∈ 𝑂 - 𝑣 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 7

  8. ¡ Limitations of shallow embedding methods: § O(|V|) parameters are needed : § No sharing of parameters between nodes § Every node has its own unique embedding § Inherently “transductive ”: § Cannot generate embeddings for nodes that are not seen during training § Do not incorporate node features : § Many graphs have features that we can and should leverage 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 8

  9. ¡ Today: We will now discuss deep methods based on graph neural networks: multiple layers of enc ( v ) = non-linear transformations of graph structure ¡ Note: All these deep encoders can be combined with node similarity functions defined in the last lecture 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 9

  10. … Output: Node embeddings Also, we can embed larger network structures, subgraphs, graphs 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 10

  11. Im Imag ages es Te Text/Speech Modern deep learning toolbox is designed for simple sequences & grids Jure Leskovec, Stanford University 11

  12. But networks are far more complex! § Arbitrary size and complex topological structure (i.e., no spatial locality like grids) vs. vs Text Te Networks ks Im Imag ages es § No fixed node ordering or reference point § Often dynamic and have multimodal features Jure Leskovec, Stanford University 12

  13. CNN on an image: Goal l is is to genera raliz lize convolu lutio ions beyond sim imple le la lattic ices Levera rage node features/attrib ributes (e (e.g. g., te text, xt, im images) 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 13

  14. Single CNN layer with 3x3 filter: (Animation Vincent Dumoul Im Image Gr Graph Transform information at the neighbors and combine it: § Transform “messages” ℎ / from neighbors: 𝑋 / ℎ / § Add them up: ∑ / 𝑋 / ℎ / 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 14

  15. But what if your graphs look like this? s like this? or this: or this: ¡ Examples: Biological networks, Medical networks, Social networks, Information networks, Knowledge graphs, Communication networks, Web graph, … 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 15

  16. ¡ Join adjacency matrix and features ¡ Feed them into a deep neural net: • Done? A B C D E Feat A 0 1 1 1 0 1 0 ? A B B 1 0 0 1 1 0 0 E C 1 0 0 1 0 0 1 C D D ¡ Issues with this idea: 1 1 1 0 1 1 1 E 0 1 0 1 0 1 0 ¡ Issues with this idea: § 𝑃(𝑂) parameters § Not applicable to graphs of different sizes § Not invariant to node ordering 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 16

  17. 1. Basics of deep learning for graphs 2. Graph Convolutional Networks 3. Graph Attention Networks (GAT) 4. Practical tips and demos 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 17

  18. ¡ Local network neighborhoods: § Describe aggregation strategies § Define computation graphs ¡ Stacking multiple layers: § Describe the model, parameters, training § How to fit the model? § Simple example for unsupervised and supervised training 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 19

  19. ¡ Assume we have a graph 𝐻 : § 𝑊 is the vertex set § 𝑩 is the adjacency matrix (assume binary) § 𝒀 ∈ ℝ :×|=| is a matrix of node features § Node features: § Social networks: User profile, User image § Biological networks: Gene expression profiles, gene functional information § No features: § Indicator vectors (one-hot encoding of a node) § Vector of constant 1: [1, 1, …, 1] 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 20

  20. [Kipf and Welling, ICLR 2017] Idea: Node’s neighborhood defines a computation graph 𝑗 𝑗 Determine node Propagate and computation graph transform information Learn how to propagate information across the graph to compute node features 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 21

  21. ¡ Key idea: Generate node embeddings based on local network neighborhoods A C TARGET NODE B B A A C B C A E F D F E D INPUT GRAPH A 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 22

  22. ¡ Intuition: Nodes aggregate information from their neighbors using neural networks A C TARGET NODE B B A A C B C A E F D F E D A INPUT GRAPH Neural networks 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 23

  23. ¡ Intuition: Network neighborhood defines a computation graph Every node defines a computation graph based on its neighborhood! 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 24

  24. ¡ Model can be of arbitrary depth: § Nodes have embeddings at each layer § Layer-0 embedding of node 𝑣 is its input feature, 𝑦 𝑣 § Layer-K embedding gets information from nodes that are K hops away La Layer-0 La Layer-1 x A A x C C TARGET NODE B B La Layer-2 x A A A x B C B C A x E E F D x F F E D A x A INPUT GRAPH 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 25

  25. ¡ Neighborhood aggregation: Key distinctions are in how different approaches aggregate information across the layers A ? C TARGET NODE B B What is in the box? A A C B ? ? C A E F D F E ? D INPUT GRAPH A 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 26

  26. ¡ Basic approach: Average information from neighbors and apply a neural network (1) average messages A from neighbors C TARGET NODE B B A A C B C A E F D F E D INPUT GRAPH A (2) apply neural network 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 27

  27. ¡ Basic approach: Average neighbor messages and apply a neural network Initial 0-th layer embeddings are Previous layer equal to node features h 0 v = x v embedding of v 0 1 h k − 1 X | N ( v ) | + B k h k − 1 h k u A , ∀ k ∈ { 1 , ..., K } v = σ @ W k v u ∈ N ( v ) Average of neighbor’s z v = h K v previous layer embeddings Embedding after K Non-linearity layers of neighborhood (e.g., ReLU) aggregation 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 28

  28. How do we train the model to generate embeddings? 𝒜 @ Need to define a loss function on the embeddings 12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend