Graph Embeddings Alicia Frame, PhD October 10, 2019 Overview - PowerPoint PPT Presentation

Graph Embeddings Alicia Frame, PhD October 10, 2019

Overview What’s an embedding? How do these work? Motivating Example - Word2Vec - Motivating Example - DeepWalk - Graph embeddings overview Graph embedding techniques Graph embeddings with Neo4j 2

TL;DR - what’s an embedding? What does the internet say? Google: “An embedding is a relatively - low-dimensional space into which you can translate high-dimensional vectors” Wikipedia : “In mathematics, an embedding - is one instance of some mathematical structure contained within another instance, such as a group that is a subgroup.” A way of mapping something (a document, an image, a graph) into a fixed length vector (or matrix) that captures key features while reducing the dimensionality 3

So what’s a graph embedding? Graph embeddings are a specific type of embedding that translate graphs, or parts of graphs, to fixed length vectors (or tensors) 4

But why bother? An embedding translates something complex into something a machine can work with Represents the important features of the input object in a - compact, low dimensional format Embedded representation can be used as a feature for ML, for - direct comparisons, or as an input representation for a DL model Embeddings - typically - learn what’s important in an unsupervised, generalizable way . 5

Motivating Examples 6

Motivating example: Word Embeddings How can I represent words in a way that I can use them mathematically? How similar are two words? - Can I use the representation of a word in a model? - Naive approach - how similar are the strings? Hand engineered rules? - How many of each letter? - CAT = [10100000000000000001000000] 7

Motivating example: Word Embeddings Can we use documents to encode words? Frequency matrix: Weighted term frequency (TF-IDF) 8

Motivating example: Word Embeddings Word order probably matters too: Words that occur together have similar contexts. He is not lazy “Tylenol is a pain reliever,” - He is intelligent “Paracetamol is a pain reliever” He is smart same context Co-occurence: how often do two - words appear in the same He is not lazy context window? He is intelligent Context window: specific - He is smart number and direction 9

Motivating example: Word Embeddings Word order probably matters too: Words that occur together have similar contexts. “Tylenol is a pain reliever,” - “Paracetamol is a pain reliever” 3 same context 3 Co-occurence: how often do two - words appear in the same context window? Context window: specific - number and direction 10

Motivating example: Word Embeddings Why not stop here? You need more documents to really understand context … but - the more documents you have the bigger your matrix is Giant sparse matrices or vectors are cumbersome and - uninformative We need to reduce the dimensionality of our matrix 11

Motivating Example: Word Embeddings Count Based Methods: Linear algebra to the rescue? Pros: Preserves semantic relationships, accurate, known methods Cons: Huge memory requirements, not trained for a specific task 12

Motivating Example: Word Embeddings Predictive Methods: learn an embedding for a specific task 13

Motivating Example: Word Embeddings Predictive Methods: learn an embedding for a specific task 14

Motivating Example: Word Embeddings The SkipGram model learns a vector representation for each word that maximizes the probability of that word given the previous words output prediction - probability, for each word in the corpus, that it’s the next word input word - one hot encoded vector 15

Motivating Example: Word Embeddings The SkipGram model learns a vector representation for each word that maximizes the probability of that word given the previous words output prediction - probability, for each word in the corpus, that it’s the next word input word - one hot encoded vector 16

Motivating Example: Word Embeddings The SkipGram model learns a vector representation for each word that maximizes the probability of that word given the previous words The hidden layer is a weight matrix with one row per word, and one column per neuron -- this is the embedding! 17

(if we really want to get into the math) Maximize the probability that the next word is w_t given h: Train model by maximizing the log-likelihood over the training set: Skipgram model calculates: 18

Motivating Example: Word Embeddings Word embeddings condense representations of the words while preserving context: 19

Cool, but what’s this got to do with graphs? 20

Motivating example: DeepWalk How do we represent a node in a graph mathematically? Can we adapt word2vec? Each node is like a word - Neighborhood around the node is the context window - 21

Motivating example: DeepWalk Extract the context for each node by sampling random walks from the graph: For every node in the graph, take n fixed length random walks (equivalent to sentences) 22

Motivating example: DeepWalk Once we have our sentences, we can extract the context windows and learn weights using the same skip-gram model (Objective is to predict neighboring nodes given the target node) 23

Motivating example: DeepWalk Embeddings are the hidden layer weights from the skipgram model Note: there are also equivalent methodologies to the matrix factorization approaches or hand engineered approaches we talked about for words as well 24

Graph Embeddings Overview 25

There are lots of graph embeddings... What type of graph are you trying to create an embedding for? Monopartite graphs (DeepWalk is designed for these) - Multipartite graphs (eg. Knowledge Graphs) - What aspect of the graph are you trying to represent? Vertex embeddings: describe connectivity of each node - Path embeddings: traversals across the graph - Graph embeddings: encode an entire graph into a single vector - 26 What tp

Node embedding overview Most techniques consist of: - A similarity function that measures the similarity between nodes - An encoder function : generates the node embedding - A decoder function to reconstruct pairwise similarity - A loss function that measures how good your reconstruction is 27

Shallow Graph Embedding Techniques Shallow - Encoder function is an embedding lookup Matrix Factorization: These techniques all rely on an adjacency matrix input - Matrix factorization is applied either directly to the input or - some transformation of the input Random Walk: Obtain node co-occurrence via random walks - Learn weight to optimize similarity measure - 28

Shallow Graph Embedding Techniques Shallow - Encoder function is an embedding lookup Matrix Factorization: These techniques all rely on an adjacency matrix input - Massive memory footprint Matrix factorization is applied either directly to the input or Computationally intense - some transformation of the input Random Walk: Obtain node co-occurrence via random walks - Learn weight to optimize similarity measure - 29

Shallow Graph Embedding Techniques Shallow - Encoder function is an embedding lookup Matrix Factorization: These techniques all rely on an adjacency matrix input - Massive memory footprint Matrix factorization is applied either directly to the input or Computationally intense - some transformation of the input Random Walk: Obtain node co-occurrence via random walks Local-only perspective - Assumes similar nodes are close together Learn weight to optimize similarity measure - 30

Shallow Graph Embedding Techniques Matrix Factorization: These techniques all rely on an adjacency matrix input - Massive memory footprint Matrix factorization is applied either directly to the input or - Computationally intense some transformation of the input Random Walk: Obtain node co-occurrence via random walks - Local-only perspective Learn weight to optimize similarity measure - Assumes similar nodes are close together 31

Shallow Embeddings Why not stick with these? Shallow embeddings are inefficient - no parameters shared - between nodes Can’t leverage node attributes - Only generate embeddings for nodes present when the - embedding was trained - problematic for large, evolving graphs Newer methodologies - compress information Neighborhood autoencoder methods - Neighborhood aggregation - Convolutional autoencoders - 32

Autoencoder methods 33

Using Graph Embeddings 34

Why are we going to all this trouble? Visualization & pattern discovery: Leveraging lots of existing - t-SNE plots - PCA - Clustering and community detection: Apply generic tabular data approaches (eg. k-means) but allows - capture of both functional and structural roles KNN graphs based on embedding similarity - 35

Why are we going to all this trouble? Node classification/semi-supervised learning Predict missing node attributes Link prediction predict edges not present in the graph - Either using similarity measures/heuristics or ML pipelines - Embeddings can make the graph algorithm library even more powerful ! 36

Graph Embeddings in Neo4j 37

Graph Embeddings Alicia Frame, PhD October 10, 2019 Overview - PowerPoint PPT Presentation

Graph Embeddings Alicia Frame, PhD October 10, 2019 Overview Whats an embedding? How do these work? Motivating Example - Word2Vec - Motivating Example - DeepWalk - Graph embeddings overview Graph embedding techniques Graph embeddings

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an

Word embeddings Rappel Embeddings ( pas Word Embeddings ) Est une lookup table Formalisme:

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP Overview

Exploiting Graph Embeddings for Graph Analysis Tasks Fatemeh Salehi Rizi Graph Embedding Day

of Graph Embeddings Aleksandar Bojchevski Technical University of Munich, Germany Graph

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Mixed membership word embeddings: Corpus-specific embeddings without big data James Foulds

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

CS 4803 / 7643: Deep Learning Guest Lecture: Embeddings and world2vec Feb. 18 th 2020 Ledell Wu

Z 2 -embeddings and Tournaments Radoslav Fulek , Jan Kyn cl Z 2 -embeddings and Tournaments

planar graph embeddings and stat mech Richard Kenyon (Brown University) Wednesday, May 11, 16

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Family of intersection problems Family of intersection problems CG Lecture 2 CG Lecture 2 1.

ON THE P-MEDIAN POLYTOPE AND THE INTERSECTION PROPERTY MOURAD BA IOU, FRANCISCO BARAHONA, AND

Accelerator Studies in China Qing QIN for the accelerator team in IHEP

Activity 4 Inference from pictures Page 27 Activity 5 Feelings from pictures Page

Social Determinants of Health Sarah Thompson, PharmD Vice President, Clinical Operations 1 SDOH

Middle Ages Medieval Art Byzantine 4 th 5 th centuries Romanesque 10 th 12 th Centuries

Engaging Large (or Small) Groups Research and Innovation in Medical Education (RIME) Education

Tradition / Innovation Bernd Draser / ecosign Academy ecosign / Academy for Sustainable Design