http cs224w stanford edu

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node - PowerPoint PPT Presentation

CS224W: Machine Learning with Graphs Jure Leskovec, Stanford University http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs,


  1. CS224W: Machine Learning with Graphs Jure Leskovec, Stanford University http://cs224w.stanford.edu

  2. ? ? ? ? Machine Learning ? Node classification 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 2

  3. ? ? x ? Machine Learning 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 4

  4. ¡ (Supervised) Machine Learning Lifecycle requires feature engineering every single time! Raw Learning Structured Model Data Data Algorithm Automatically Feature Downstream learn the features Engineering task 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 5

  5. Goal: Efficient task-independent feature learning for machine learning with graphs! vec node u 𝑔: 𝑣 → ℝ & ℝ & Feature representation, embedding 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 6

  6. ¡ Task: We map each node in a network into a • low-dimensional space § Distributed representations for nodes – § Similarity of embeddings between nodes indicates – their network similarity § Encode network information and generate node – representation 17 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 7

  7. ¡ 2D embeddings of nodes of the Zachary’s Karate Club network: • Zachary’s Karate Network: Image from: Perozzi et al. DeepWalk: Online Learning of Social Representations. KDD 2014. 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 8

  8. ¡ Modern deep learning toolbox is designed for simple sequences or grids. § CNNs for fixed-size images/grids…. § RNNs or word2vec for text/sequences… 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 9

  9. ¡ But networks are far more complex! § Complex topographical structure (i.e., no spatial locality like grids) § No fixed node ordering or reference point (i.e., the isomorphism problem) § Often dynamic and have multimodal features. 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 10

  10. ¡ Assume we have a graph G : § V is the vertex set. § A is the adjacency matrix (assume binary). § No node features or extra information is used! 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 12

  11. ¡ Goal is to encode nodes so that similarity in the embedding space (e.g., dot product) approximates similarity in the original network 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 13

  12. Go Goal: similarity( u, v ) ≈ z > v z u in the original network Similarity of the embedding Ne Need t to d define! 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 14

  13. Define an encoder (i.e., a mapping from 1. nodes to embeddings) Define a node similarity function (i.e., a 2. measure of similarity in the original network) Optimize the parameters of the encoder so 3. that: similarity( u, v ) ≈ z > v z u in the original network Similarity of the embedding 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 15

  14. ¡ Encoder: maps each node to a low- dimensional vector d -dimensional embedding enc ( v ) = z v node in the input graph ¡ Similarity function: specifies how the relationships in vector space map to the relationships in the original network similarity( u, v ) ≈ z > v z u Similarity of u and v in dot product between node the original network embeddings 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 16

  15. ¡ Simplest encoding approach: encoder is just an embedding-lookup enc ( v ) = Zv matrix, each column is a node Z ∈ R d × |V| embedding [w [what w we l learn!] !] indicator vector, all zeroes v ∈ I |V| except a one in column indicating node v 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 17

  16. ¡ Simplest encoding approach: encoder is just an embedding-lookup embedding vector for a specific node embedding matrix Dimension/size Z = of embeddings one column per node 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 18

  17. Simplest encoding approach: encoder is just an embedding-lookup Each node is assigned to a unique embedding vector Many methods: DeepWalk, node2vec, TransE 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 19

  18. ¡ Key choice of methods is how they define node similarity. ¡ E.g., should two nodes have similar embeddings if they…. § are connected? § share neighbors? § have similar “structural roles”? § …? 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 20

  19. Material based on: Perozzi et al. 2014. DeepWalk: Online Learning of Social Representations. KDD. • Grover et al. 2016. node2vec: Scalable Feature Learning for Networks. KDD. •

  20. 10 9 12 2 8 1 11 3 Given a graph and a starting 4 point , we select a neighbor of it at random , and move to this neighbor; then we select 6 a neighbor of this point at 5 random, and move to it, etc. The (random) sequence of points selected this way is a 7 random walk on the graph . 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 22

  21. probability that u z > and v co-occur on u z v ≈ a random walk over the network 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 23

  22. Estimate probability of visiting node 𝒘 on a 1. random walk starting from node 𝒗 using some random walk strategy 𝑺 Optimize embeddings to encode these 2. random walk statistics: Similarity (here: dot product= cos(𝜄) ) encodes random walk “similarity” 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 24

  23. Expressivity: Flexible stochastic definition of 1. node similarity that incorporates both local and higher-order neighborhood information Efficiency: Do not need to consider all node 2. pairs when training; only need to consider pairs that co-occur on random walks 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 25

  24. ¡ Intuition: Find embedding of nodes to d-dimensions that preserves similarity ¡ Idea: Learn node embedding such that nearby nodes are close together in the network ¡ Given a node 𝑣 , how do we define nearby nodes? § 𝑂 7 𝑣 … neighbourhood of 𝑣 obtained by some strategy 𝑆 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 26

  25. ¡ Given 𝐻 = (𝑊, 𝐹) , ¡ Our goal is to learn a mapping 𝑨: 𝑣 → ℝ & . ¡ Log-likelihood objective: max C log P(𝑂 J (𝑣)| 𝑨 D ) B D ∈F § where 𝑂 7 (𝑣) is neighborhood of node 𝑣 by strategy 𝑆 ¡ Given node 𝑣 , we want to learn feature representations that are predictive of the nodes in its neighborhood 𝑂 J (𝑣) 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 27

  26. Run short fixed-length random walks 1. starting from each node on the graph using some strategy R For each node 𝑣 collect 𝑂 7 (𝑣) , the multiset * 2. of nodes visited on random walks starting from u Optimize embeddings according to: Given 3. node 𝑣 , predict its neighbors 𝑂 J (𝑣) max C log P(𝑂 J (𝑣)| 𝑨 D ) B D ∈F * 𝑂 7 (𝑣) can have repeat elements since nodes can be visited multiple times on random walks 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 28

  27. X X L = − log( P ( v | z u )) u ∈ V v ∈ N R ( u ) • Intuition: Optimize embeddings to maximize likelihood of random walk co-occurrences • Parameterize 𝑄(𝑤|𝒜 𝑣 ) using softmax: exp( z > Why softmax? u z v ) We want node 𝑤 to be P ( v | z u ) = most similar to node 𝑣 P n 2 V exp( z > u z n ) (out of all nodes 𝑜 ). Intuition: ∑ R exp 𝑦 R ≈ max exp(𝑦 R ) R 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 30

  28. Putting it all together: exp( z > ✓ ◆ u z v ) X X L = − log P n 2 V exp( z > u z n ) u 2 V v 2 N R ( u ) predicted probability of 𝑣 sum over nodes 𝑤 sum over all and 𝑤 co-occuring on seen on random nodes 𝑣 random walk walks starting from 𝑣 Optimizing random walk embeddings = Finding embeddings z u that minimize L 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 31

  29. But doing this naively is too expensive!! exp( z > ✓ ◆ u z v ) X X L = − log P n 2 V exp( z > u z n ) u 2 V v 2 N R ( u ) Nested sum over nodes gives O(|V| 2 ) complexity! 10/15/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 32

Recommend


More recommend