Graph Embeddings Alicia Frame, PhD October 10, 2019 Overview - - PowerPoint PPT Presentation

graph embeddings
SMART_READER_LITE
LIVE PREVIEW

Graph Embeddings Alicia Frame, PhD October 10, 2019 Overview - - PowerPoint PPT Presentation

Graph Embeddings Alicia Frame, PhD October 10, 2019 Overview Whats an embedding? How do these work? Motivating Example - Word2Vec - Motivating Example - DeepWalk - Graph embeddings overview Graph embedding techniques Graph embeddings


slide-1
SLIDE 1

Graph Embeddings

Alicia Frame, PhD October 10, 2019

slide-2
SLIDE 2

What’s an embedding? How do these work?

  • Motivating Example - Word2Vec
  • Motivating Example - DeepWalk

Graph embeddings overview Graph embedding techniques Graph embeddings with Neo4j

2

Overview

slide-3
SLIDE 3

What does the internet say?

  • Google: “An embedding is a relatively

low-dimensional space into which you can translate high-dimensional vectors”

  • Wikipedia: “In mathematics, an embedding

is one instance of some mathematical structure contained within another instance, such as a group that is a subgroup.”

3

TL;DR - what’s an embedding?

A way of mapping something (a document, an image, a graph) into a fixed length vector (or matrix) that captures key features while reducing the dimensionality

slide-4
SLIDE 4

Graph embeddings are a specific type of embedding that translate graphs, or parts of graphs, to fixed length vectors (or tensors)

4

So what’s a graph embedding?

slide-5
SLIDE 5

An embedding translates something complex into something a machine can work with

  • Represents the important features of the input object in a

compact, low dimensional format

  • Embedded representation can be used as a feature for ML, for

direct comparisons, or as an input representation for a DL model Embeddings - typically - learn what’s important in an unsupervised, generalizable way.

5

But why bother?

slide-6
SLIDE 6

6

Motivating Examples

slide-7
SLIDE 7

How can I represent words in a way that I can use them mathematically?

  • How similar are two words?
  • Can I use the representation of a word in a model?

Naive approach - how similar are the strings?

  • Hand engineered rules?
  • How many of each letter?

CAT = [10100000000000000001000000]

7

Motivating example: Word Embeddings

slide-8
SLIDE 8

Frequency matrix:

8

Motivating example: Word Embeddings

Weighted term frequency (TF-IDF) Can we use documents to encode words?

slide-9
SLIDE 9

Word order probably matters too: Words that occur together have similar contexts.

9

Motivating example: Word Embeddings

  • “Tylenol is a pain reliever,”

“Paracetamol is a pain reliever” same context

  • Co-occurence: how often do two

words appear in the same context window?

  • Context window: specific

number and direction He is not lazy He is intelligent He is smart He is not lazy He is intelligent He is smart

slide-10
SLIDE 10

Word order probably matters too: Words that occur together have similar contexts.

10

Motivating example: Word Embeddings

  • “Tylenol is a pain reliever,”

“Paracetamol is a pain reliever” same context

  • Co-occurence: how often do two

words appear in the same context window?

  • Context window: specific

number and direction

3 3

slide-11
SLIDE 11

Why not stop here?

  • You need more documents to really understand context … but

the more documents you have the bigger your matrix is

  • Giant sparse matrices or vectors are cumbersome and

uninformative We need to reduce the dimensionality of our matrix

11

Motivating example: Word Embeddings

slide-12
SLIDE 12

Count Based Methods: Linear algebra to the rescue? Pros: Preserves semantic relationships, accurate, known methods Cons: Huge memory requirements, not trained for a specific task

12

Motivating Example: Word Embeddings

slide-13
SLIDE 13

13

Motivating Example: Word Embeddings

Predictive Methods: learn an embedding for a specific task

slide-14
SLIDE 14

14

Motivating Example: Word Embeddings

Predictive Methods: learn an embedding for a specific task

slide-15
SLIDE 15

The SkipGram model learns a vector representation for each word that maximizes the probability of that word given the previous words

15

Motivating Example: Word Embeddings

input word - one hot encoded vector

  • utput prediction -

probability, for each word in the corpus, that it’s the next word

slide-16
SLIDE 16

The SkipGram model learns a vector representation for each word that maximizes the probability of that word given the previous words

16

Motivating Example: Word Embeddings

input word - one hot encoded vector

  • utput prediction -

probability, for each word in the corpus, that it’s the next word

slide-17
SLIDE 17

The SkipGram model learns a vector representation for each word that maximizes the probability of that word given the previous words

17

Motivating Example: Word Embeddings

The hidden layer is a weight matrix with

  • ne row per word,

and one column per neuron -- this is the embedding!

slide-18
SLIDE 18

Maximize the probability that the next word is w_t given h: Train model by maximizing the log-likelihood over the training set: Skipgram model calculates:

18

(if we really want to get into the math)

slide-19
SLIDE 19

19

Motivating Example: Word Embeddings

Word embeddings condense representations of the words while preserving context:

slide-20
SLIDE 20

20

Cool, but what’s this got to do with graphs?

slide-21
SLIDE 21

Motivating example: DeepWalk

21

How do we represent a node in a graph mathematically? Can we adapt word2vec?

  • Each node is like a word
  • Neighborhood around the node is the context window
slide-22
SLIDE 22

Extract the context for each node by sampling random walks from the graph: For every node in the graph, take n fixed length random walks (equivalent to sentences)

22

Motivating example: DeepWalk

slide-23
SLIDE 23

Once we have our sentences, we can extract the context windows and learn weights using the same skip-gram model (Objective is to predict neighboring nodes given the target node)

23

Motivating example: DeepWalk

slide-24
SLIDE 24

Embeddings are the hidden layer weights from the skipgram model Note: there are also equivalent methodologies to the matrix factorization approaches or hand engineered approaches we talked about for words as well

24

Motivating example: DeepWalk

slide-25
SLIDE 25

25

Graph Embeddings Overview

slide-26
SLIDE 26

There are lots of graph embeddings...

26

What type of graph are you trying to create an embedding for?

  • Monopartite graphs (DeepWalk is designed for these)
  • Multipartite graphs (eg. Knowledge Graphs)

What aspect of the graph are you trying to represent?

  • Vertex embeddings: describe connectivity of each node
  • Path embeddings: traversals across the graph
  • Graph embeddings: encode an entire graph into a single vector

What tp

slide-27
SLIDE 27

Most techniques consist of:

  • A similarity function that measures the similarity between nodes
  • An encoder function: generates the node embedding
  • A decoder function to reconstruct pairwise similarity
  • A loss function that measures how good your reconstruction is

27

Node embedding overview

slide-28
SLIDE 28

Shallow - Encoder function is an embedding lookup Matrix Factorization:

  • These techniques all rely on an adjacency matrix input
  • Matrix factorization is applied either directly to the input or

some transformation of the input Random Walk:

  • Obtain node co-occurrence via random walks
  • Learn weight to optimize similarity measure

28

Shallow Graph Embedding Techniques

slide-29
SLIDE 29

Shallow - Encoder function is an embedding lookup Matrix Factorization:

  • These techniques all rely on an adjacency matrix input
  • Matrix factorization is applied either directly to the input or

some transformation of the input Random Walk:

  • Obtain node co-occurrence via random walks
  • Learn weight to optimize similarity measure

29

Shallow Graph Embedding Techniques

Massive memory footprint Computationally intense

slide-30
SLIDE 30

Shallow - Encoder function is an embedding lookup Matrix Factorization:

  • These techniques all rely on an adjacency matrix input
  • Matrix factorization is applied either directly to the input or

some transformation of the input Random Walk:

  • Obtain node co-occurrence via random walks
  • Learn weight to optimize similarity measure

30

Shallow Graph Embedding Techniques

Massive memory footprint Computationally intense Local-only perspective Assumes similar nodes are close together

slide-31
SLIDE 31

Matrix Factorization:

  • These techniques all rely on an adjacency matrix input
  • Matrix factorization is applied either directly to the input or

some transformation of the input Random Walk:

  • Obtain node co-occurrence via random walks
  • Learn weight to optimize similarity measure

31

Shallow Graph Embedding Techniques

Massive memory footprint Computationally intense Local-only perspective Assumes similar nodes are close together

slide-32
SLIDE 32

Why not stick with these?

  • Shallow embeddings are inefficient - no parameters shared

between nodes

  • Can’t leverage node attributes
  • Only generate embeddings for nodes present when the

embedding was trained - problematic for large, evolving graphs Newer methodologies - compress information

  • Neighborhood autoencoder methods
  • Neighborhood aggregation
  • Convolutional autoencoders

32

Shallow Embeddings

slide-33
SLIDE 33

33

Autoencoder methods

slide-34
SLIDE 34

Using Graph Embeddings

34

slide-35
SLIDE 35

Why are we going to all this trouble?

35

Visualization & pattern discovery:

  • Leveraging lots of existing
  • t-SNE plots
  • PCA

Clustering and community detection:

  • Apply generic tabular data approaches (eg. k-means) but allows

capture of both functional and structural roles

  • KNN graphs based on embedding similarity
slide-36
SLIDE 36

Node classification/semi-supervised learning Predict missing node attributes Link prediction

  • predict edges not present in the graph
  • Either using similarity measures/heuristics or ML pipelines

36

Why are we going to all this trouble?

Embeddings can make the graph algorithm library even more powerful!

slide-37
SLIDE 37

Graph Embeddings in Neo4j

37

slide-38
SLIDE 38

Two prototype implementations from Labs: DeepWalk & DeepGL

  • DeepGL is more similar to a “hand crafted” embedding
  • Uses graph algorithms to generate features
  • Diffusion of values across edges, dimensionality reduction

Neither is ready for production use - but lessons learned!

  • Lots of demand
  • Memory intensive and not turned for performance
  • Deep Learning is not easy in Java

Python is easy to get started with for experimentation, but doesn’t perform at scale

38

Neo4j Labs Implementations

slide-39
SLIDE 39

We’re actively exploring the best ways to implement graph embeddings at scale so please stay tuned

39

...So what’s next?

slide-40
SLIDE 40

1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes

  • 2. An embedding is a ______________ representation of your data

a. Human readable b. Lower dimensional c. Binary

  • 3. What’s the name of the graph embedding we walked through in

this presentation?

40

Hunger Games!