http://cs246.stanford.edu ? ? x ? Machine Learning 2/12/20 - PowerPoint PPT Presentation

Note to other teachers and users of these slides: We would be delighted if you found our material useful for giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. If you make use of a significant portion of these slides in your own lecture, please include this message, or a link to our web site: http://www.mmds.org CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu

? ? x ? Machine Learning 2/12/20 Jure Leskovec, Stanford C246: Mining Massive Datasets 2

? ? ? ? Machine Learning ? Node classification 2/12/20 Jure Leskovec, Stanford C246: Mining Massive Datasets 3

Classifying the function of proteins in the interactome Image from: Ganapathiraju et al. 2016. Schizophrenia interactome with 504 novel protein–protein interactions. Nature . 2/12/20 Jure Leskovec, Stanford C246: Mining Massive Datasets 4

¡ (Supervised) Machine Learning Lifecycle requires feature engineering every single time! Raw Learning Structured Model Data Data Algorithm Automatically Feature Downstream learn the features Engineering task 2/12/20 Jure Leskovec, Stanford C246: Mining Massive Datasets 5

Goal: Efficient task-independent feature learning for machine learning in networks! vec node u 𝑔: 𝑣 → ℝ & ℝ & Feature representation, embedding 2/12/20 Jure Leskovec, Stanford C246: Mining Massive Datasets 6

Task: We map each node in a network to a • point in a low-dimensional space § Distributed representation for nodes – § Similarity of embedding between nodes indicates – their network similarity § Encode network information and generate node – representation 17 2/12/20 Jure Leskovec, Stanford C246: Mining Massive Datasets 7

2D embedding of nodes of the Zachary’s Karate Club network: • Zachary�s Karate Network: Image from: Perozzi et al. DeepWalk: Online Learning of Social Representations. KDD 2014. 2/12/20 Jure Leskovec, Stanford C246: Mining Massive Datasets 8

¡ Modern deep learning toolbox is designed for simple sequences or grids § CNNs for fixed-size images/grids…. § RNNs or word2vec for text/sequences… 2/12/20 Jure Leskovec, Stanford C246: Mining Massive Datasets 9

But networks are far more complex! ¡ Complex topographical structure (no spatial locality like grids) vs vs. Text Te Networks ks Im Imag ages es ¡ No fixed node ordering or reference point ¡ Often dynamic and have multimodal features. 2/12/20 Jure Leskovec, Stanford C246: Mining Massive Datasets 10

Assume we have a graph G : ¡ V is the vertex set ¡ A is the adjacency matrix (assume binary) ¡ No node features or extra information is used! 2/12/20 Jure Leskovec, Stanford C246: Mining Massive Datasets 12

¡ Goal is to encode nodes so that similarity in the embedding space (e.g., dot product) approximates similarity in the original network 2/12/20 Jure Leskovec, Stanford C246: Mining Massive Datasets 13

Go Goal: similarity( u, v ) ≈ z > v z u in the original network Similarity of the embedding Ne Need t to d define! 2/12/20 Jure Leskovec, Stanford C246: Mining Massive Datasets 14

Define an encoder (i.e., a mapping from 1. nodes to embeddings) Define a node similarity function (i.e., a 2. measure of similarity in the original network) Optimize the parameters of the encoder 3. so that: similarity( u, v ) ≈ z > v z u in the original network Similarity of the embedding 2/12/20 Jure Leskovec, Stanford C246: Mining Massive Datasets 15

¡ Encoder maps each node to a low- dimensional vector d -dimensional embedding enc ( v ) = z v node in the input graph ¡ Similarity function specifies how relationships in vector space map to relationships in the original network similarity( u, v ) ≈ z > v z u Similarity of u and v in dot product between node the original network embeddings 2/12/20 Jure Leskovec, Stanford C246: Mining Massive Datasets 16

¡ Simplest encoding approach: encoder is just an embedding-lookup enc ( v ) = Zv Matrix, each column is 𝑒 -dim node Z ∈ R d × |V| embedding [w [what w we l learn!] !] Indicator vector, all zeroes v ∈ I |V| except for a “1” at the position that corresponds to node 𝑤 2/12/20 Jure Leskovec, Stanford C246: Mining Massive Datasets 17

¡ Simplest encoding approach: encoder is just an embedding-lookup embedding vector for a specific node embedding matrix Dimension/size Z = of embeddings one column per node 2/12/20 Jure Leskovec, Stanford C246: Mining Massive Datasets 18

Simplest encoding approach: encoder is just an embedding-lookup Each node is assigned a unique embedding vector Many methods: node2vec, DeepWalk, LINE 2/12/20 Jure Leskovec, Stanford C246: Mining Massive Datasets 19

Key choice of methods is how they define node similarity. E.g., should two nodes have similar embeddings if they… ¡ are connected? ¡ share neighbors? ¡ have similar “structural roles”? ¡ …? 2/12/20 Jure Leskovec, Stanford C246: Mining Massive Datasets 20

Material based on: Perozzi et al. 2014. DeepWalk: Online Learning of Social Representations. KDD. • Grover et al. 2016. node2vec: Scalable Feature Learning for Networks. KDD. •

Probability that 𝑣 z > and 𝑤 co-occur on u z v ≈ a random walk over the network 𝑨 0 … embedding of node 𝑣 2/12/20 Jure Leskovec, Stanford C246: Mining Massive Datasets 22

Estimate probability of visiting node 𝒘 on a 1. random walk starting from node 𝒗 using some random walk strategy 𝑺 Optimize embeddings to encode these 2. random walk statistics: 𝑨 0 Similarity (here: dot product= cos(𝜄) ) encodes random walk “similarity” 𝑨 : 2/12/20 Jure Leskovec, Stanford C246: Mining Massive Datasets 23

Expressivity: Flexible stochastic 1. definition of node similarity that incorporates both local and higher- order neighborhood information Efficiency: Do not need to consider all 2. node pairs when training; only need to consider pairs that co-occur on random walks 2/12/20 Jure Leskovec, Stanford C246: Mining Massive Datasets 24

¡ Intuition: Find embedding of nodes in 𝑒 -dimensional space so that node similarity is preserved ¡ Idea: Learn node embedding such that nearby nodes are close together in the network ¡ Given a node 𝒗 , how do we define nearby nodes? § 𝑂 < 𝑣 … neighbourhood of 𝑣 obtained by some strategy 𝑆 2/12/20 Jure Leskovec, Stanford C246: Mining Massive Datasets 25

¡ Given 𝐻 = (𝑊, 𝐹) ¡ Our goal is to learn a mapping 𝑨: 𝑣 → ℝ & ¡ Log-likelihood objective: max G log P(𝑂 M (𝑣)| 𝑨 0 ) F 0 ∈I § where 𝑂 < (𝑣) is neighborhood of node 𝑣 ¡ Given node 𝑣 , we want to learn feature representations predictive of nodes in its neighborhood 𝑂 M (𝑣) 2/12/20 Jure Leskovec, Stanford C246: Mining Massive Datasets 26

Run short fixed-length random walks 1. starting from each node on the graph using some strategy R For each node 𝑣 collect 𝑂 < (𝑣) , the multiset * 2. of nodes visited on random walks starting from u Optimize embeddings according to: Given 3. node 𝑣 , predict its neighbors 𝑂 M (𝑣) max G log P(𝑂 M (𝑣)| 𝑨 0 ) F 0 ∈I * 𝑂 < (𝑣) can have repeat elements since nodes can be visited multiple times on random walks 2/12/20 Jure Leskovec, Stanford C246: Mining Massive Datasets 27

max G log P(𝑂 M (𝑣)| 𝑨 0 ) F 0 ∈I ¡ Assumption: Conditional likelihood factorizes over the set of neighbors: log P(𝑂 M (𝑣)|𝑨 0 ) = G log P(z : | 𝑨 0 ) :∈P Q (0) ¡ Softmax parametrization: Why softmax? STU(V W ⋅F Y ) P z : 𝑨 0 ) = We want node 𝑤 to be most similar to node 𝑣 ∑ [∈\ STU(V ] ⋅F Y ) (out of all nodes 𝑜 ). Intuition: ∑ _ exp 𝑦 _ ≈ max exp(𝑦 _ ) _ 2/12/20 Jure Leskovec, Stanford C246: Mining Massive Datasets 28

Putting it all together: exp( z > ✓ ◆ u z v ) X X L = − log P n 2 V exp( z > u z n ) u 2 V v 2 N R ( u ) predicted probability of sum over nodes 𝑤 sum over all 𝑤 appearing in random seen on random nodes 𝑣 walk starting from 𝑣 walks starting from 𝑣 Optimizing random walk embeddings = Finding node embeddings 𝒜 that minimize L 2/12/20 Jure Leskovec, Stanford C246: Mining Massive Datasets 30

But doing this naively is too expensive!! exp( z > ✓ ◆ u z v ) X X L = − log P n 2 V exp( z > u z n ) u 2 V v 2 N R ( u ) Nested sum over nodes gives O(|V| 2 ) complexity! 2/12/20 Jure Leskovec, Stanford C246: Mining Massive Datasets 31

But doing this naively is too expensive!! exp( z > ✓ ◆ u z v ) X X L = − log P n 2 V exp( z > u z n ) u 2 V v 2 N R ( u ) The normalization term from the softmax is the culprit… can we approximate it? 2/12/20 Jure Leskovec, Stanford C246: Mining Massive Datasets 32

http://cs246.stanford.edu ? ? x ? Machine Learning 2/12/20 - PowerPoint PPT Presentation

Note to other teachers and users of these slides: We would be delighted if you found our material useful for giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. If you make use of a

http://cs246.stanford.edu Web pages are not equally important www.joe-schmoe.com vs.

http://cs246.stanford.edu Instructor: Jure Leskovec TAs: Aditya Parameswaran

http://cs246.stanford.edu CPU Machine Learning, Statistics Memory Classical Data Mining

http://cs246.stanford.edu High dim. Graph Infinite Machine Apps data data data learning

http://cs246.stanford.edu More algorithms for streams: (1) Filtering a data stream: Bloom

http://cs246.stanford.edu High-dimension == many features Find concepts/topics/genres:

http://cs246.stanford.edu High dim. High dim. Graph Graph Infinite Infinite Machine Machine Apps

http://cs246.stanford.edu Classic model of algorithms You get to see the entire input, then

http://cs246.stanford.edu Rank nodes using link structure PageRank: Link voting: P

http://cs246.stanford.edu Web advertising Weve learned how to match advertisers to

http://cs246.stanford.edu Web advertising We discussed how to match advertisers to queries

http://cs246.stanford.edu Web advertising We discussed how to match

http://cs246.stanford.edu Web advertising We discussed how to match advertisers to

http://cs246.stanford.edu Training data 100 million ratings, 480,000 users, 17,770 movies

http://cs246.stanford.edu TAs : Bahman Bahmani Juthika Dabholkar Pierre Kreitmann

http://cs246.stanford.edu High dimensional == many features Find

Towards Microscopic Optical potential from Coupled Cluster J. Rotureau In collaboration with:

Entropy Change in Entropy Reversible Isobaric Process Ideal Gas in a Reversible Process Free

Microscopic Slides Market- Global Opportunity Analysis and Industry Forecast 2017-2022

THE EFFECT OF MICROSCOPIC GAP DISPLACEMENT ON THE CORRELATION OF GAPS IN DIMER SYSTEMS Mihai

A microscopic approach to Souslin trees constructions Forcing and its Applications Retrospective

Bid Number: GEM/2020/B/751286 Dated: 16-08-2020 Bid Document Bid Details Bid End Date/Time

Biochemistry What is Tissue? OED: Any of the disGnct types

PANDA with Augmented IP Level Data Yves Vanaubel, Benoit Donnet AIMS Workshop, March 2018