[PPT] - Graph and Knowledge Graph Representation Learning Prof. Srijan PowerPoint Presentation

SLIDE 1

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

1

CSE 6240: Web Search and Text Mining. Spring 2020

Graph and Knowledge Graph Representation Learning

Prof. Srijan Kumar

http://cc.gatech.edu/~srijan

SLIDE 2

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

2

Today’s Lecture

Embedding entire graphs
Introduction to Knowledge Graphs
Embeddings in Knowledge Graphs

– TransE – TransR

SLIDE 3

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

3

Embedding Entire Graphs

Goal: How to embed an entire graph 𝐻?
Tasks:

– Classifying toxic vs. non-toxic molecules – Identifying anomalous graphs

𝒜$

SLIDE 4

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

4

Approach #1

Simple idea:

Run a standard graph embedding technique on

the (sub)graph 𝐻

Then just sum (or average) the node

embeddings in the (sub)graph 𝐻

Used by Duvenaud et al., 2016 to classify

molecules based on their graph structure

– Convolutional Networks on Graphs for Learning

Molecular Fingerprints. NeurIPS 2015

𝑨$ = ' 𝑨(

(∈$

SLIDE 5

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

5

Approach #2

Idea: Introduce a “virtual node” to represent

the (sub)graph and run a standard graph embedding technique

Proposed by Li et al., 2016 as a general

technique for subgraph embedding

– Gated Graph Sequence Neural Networks. ICLR 2016

SLIDE 6

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

6

Approach #3

Represent a graph as a

distribution/set of walks

n that graph
Anonymous Walk

Embeddings:

– States in anonymous walk correspond to the index

f the first time we visited

the node in a random walk – Anonymous Walk Embeddings, ICML 2018

SLIDE 7

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

7

Number of Walks Grows

The number of anonymous walks grows exponentially:

– There are 5 anon. walks 𝑏, of length 3:

𝑏-=111, 𝑏.=112, 𝑏/= 121, 𝑏0= 122, 𝑏1= 123

SLIDE 8

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

8

Idea #1: Anonymous Walks

Enumerate all possible anonymous walks 𝑏,
f 𝑚 steps and record their counts
Represent the graph as a probability

distribution over these walks

For example:

– Set 𝑚 = 3 – Then we can represent the graph as a 5-dim vector

Since there are 5 anonymous walks 𝑏, of length 3:

111, 112, 121, 122, 123

– 𝑎$[𝑗] = probability of anonymous walk 𝑏, in 𝐻

SLIDE 9

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

9

Idea #2: Learn Walk Embeddings

Learn embedding 𝒜𝒋 of every anonymous walk 𝒃𝒋

The embedding of a graph 𝐻 is then

sum/avg/concatenation of walk embeddings z,

SLIDE 10

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

10

Idea #2: Learn Walk Embeddings

How to embed walks?

Idea: Embed walks such

that the next walk starting from the same node can be predicted

– Set walk embedding z, such that we maximize 𝑄 𝑥>

? 𝑥>@A ?

, … , 𝑥>

? = 𝑔(𝑨)

Where 𝑥>

? is a 𝑢-th random

walk starting at node 𝑣

– Similar to the word2vec idea

SLIDE 11

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

11

Idea #2: Learn Walk Embeddings

Run 𝑼 different random walks from 𝒗

each of length 𝒎: 𝑂M 𝑣 = 𝑏-

?, 𝑏. ? … 𝑏N ?

– Let 𝑏, be its anonymous version of walk 𝑥,

Learn to predict walks that co-occur in 𝚬-

size window

SLIDE 12

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

12

Idea #2: Learn Walk Embeddings

Estimate embedding 𝒜𝒋 of anonymous

walk 𝒃𝒋 of 𝒙𝒋: max 1 𝑈 ' log 𝑄(𝑏>|𝑏>@A, … , 𝑏>@-)

N >ZA

where: Δ = context window size

𝑄 𝑥> 𝑥>@A, … , 𝑥>@- =

\]^(_ `a ) ∑ \]^(_(`c))

d c

, i.e., softmax over all walks

𝑔(𝑏>) = 𝑐 + 𝑉 ⋅
A ∑

𝑨,

A ,Z-

– where 𝑐 ∈ ℝ, 𝑉 ∈ ℝj, 𝑨, is the embedding of 𝑏, (anonymized version of walk 𝑥,)

SLIDE 13

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

13

Summary of Graph Embeddings

We discussed 3 ideas to graph embeddings:

Approach 1: Embed nodes and sum/average

them

Approach 2: Create super-node that spans

the (sub) graph and then embed that node

Approach 3: Anonymous Walk Embeddings

– Idea 1: Represent the graph via the distribution

ver all the anonymous walks

– Idea 2: Embed anonymous walks

SLIDE 14

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

14

Today’s Lecture

Embedding entire graphs
Introduction to Knowledge Graphs
Embeddings in Knowledge Graphs

– TransE – TransR

SLIDE 15

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

15

Knowledge Graphs

Knowledge in graph form

– Capture entities, types, and relationships

Nodes are entities
Nodes are labeled with

their types

Edges between two

nodes capture relationships between entities

SLIDE 16

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

16

Example: Bibliographic networks

Node types: paper, title, author, conference,

year

Relation types: pubWhere, pubYear,

hasTitle, hasAuthor, cite

SLIDE 17

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

17

Example: Social networks

Node types: account, song, post, food,

channel

Relation types: friend, like, cook, watch,

listen

SLIDE 18

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

18

Example: Google Knowledge Graph

paintedBy

SLIDE 19

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

19

Knowledge Graphs in Practice

Google Knowledge Graph
Amazon Product Graph
Facebook Graph API
IBM Watson
Microsoft Satori
Project Hanover/Literome
LinkedIn Knowledge Graph
Yandex Object Answer

SLIDE 20

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

20

Applications of Knowledge Graphs

Serving information

SLIDE 21

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

21

Applications of Knowledge Graphs

Question answering and conversation

agents

SLIDE 22

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

22

Knowledge Graph Datasets

Publicly available KGs:

– FreeBase, Wikidata, Dbpedia, YAGO, NELL

Common characteristics:

– Massive: millions of nodes and edges – Incomplete: many true edges are missing

Given a massive KG, enumerating all the possible facts is intractable! Can we predict plausible BUT missing links?

SLIDE 23

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

23

Example: Freebase

Freebase

– ~50 million entities – ~38K relation types – ~3 billion facts/triples

FB15k/FB15k-237

– A complete subset of Freebase, used by researchers to learn KG models

93.8% of persons from Freebase have no place of birth and 78.5% have no nationality!

[1] Paulheim, Heiko. "Knowledge graph refinement: A survey of approaches and evaluation methods." Semantic web 8.3 (2017): 489-508. [2] Min, Bonan, et al. "Distant supervision for relation extraction with an incomplete knowledge base." Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2013.

SLIDE 24

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

24

Today’s Lecture

Embedding entire graphs
Introduction to Knowledge Graphs
Embeddings in Knowledge Graphs

– TransE – TransR

SLIDE 25

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

25

Key Task: KG Completion

Knowledge Graph completion is a link

prediction problem

KG incompleteness can substantially affect

the efficiency of systems relying on it

Main paper: Translating Embeddings for

Modeling Multi-relational Data. Bordes, Usunier, Garcia-Duran. NeurIPS 2013.

SLIDE 26

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

26

Key Task: KG Completion

missing relation

Intuition: a link prediction model that learns

from local and global connectivity patterns in the KG, taking into account entities and relationships of different types at the same time

Models: TransE

and TransR

SLIDE 27

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

27

Translating Embeddings: TransE

Relationships between entities = triplets

– 𝒊 (head entity), 𝒎 (relation), 𝒖 (tail entity) => (ℎ, 𝑚, 𝑢)

Entities and relations are all embedded in

an entity space 𝑆o

Relations are represented as translations

– ℎ + 𝑚 ≈ 𝑢 if the given fact is true; else, ℎ + 𝑚 ≠ 𝑢

SLIDE 28

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

28

TransE

Translation Intuition:

– For a triple (ℎ, 𝑠, 𝑢), 𝐢, 𝐬, 𝐮 ∈ ℝv,

𝐢 + 𝐬 = 𝐮

Score function: 𝑔

w ℎ, 𝑢 = ||ℎ + 𝑠 − 𝑢||

𝐢 𝐮 𝐬

Obama Nationality American

NOTATION: embedding vectors will appear in boldface

SLIDE 29

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

29

Link Prediction in a KG using TransE

Who has won the Turing award?
Who is a Canadian citizen?

Win

Hinton Bengio Pearl Turing Award Canada Trudeau Bieber

𝐫

Answers!

Hinton Bengio Pearl Turing Award Canada

Citizen

Trudeau Bieber

Answers!

𝐫

SLIDE 30

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

30

TransE Optimization

Learn embeddings such that ℎ + 𝑚 = 𝑢 for

real triplets that exist in the knowledge graph, ℎ + 𝑚 ≠ 𝑢 for triplets that do not exist

– Create a positive training set: of valid triples – Create a negative training set: by replacing entities/relations from valid triples

Replacement is by random sampling

– Update embeddings till the distance for positive training set triples is minimized and distance for negative training set triples is maximized

SLIDE 31

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

31

TransE Training

Translation Intuition: for a triple (ℎ, 𝑚, 𝑢),

𝐢 + 𝒎 = 𝐮

Max-margin loss:

ℒ = ' 𝛿 + 𝑒(ℎ + 𝑚, 𝑢) − 𝑒(ℎ′ + 𝑚, 𝑢′)

(~,•,>)∈$,(~€,•,>€)∉$

where 𝛿 is the margin, i.e., the smallest distance tolerated by the model between a valid triple and a corrupted one.

Valid triple Negative triple

SLIDE 32

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

32

TransE Learning Algorithm

Entities and relations are initialized uniformly, and normalized Negative sampling with triplet that does not appear in the KG Comparative loss: favors lower distance values for valid triplets, high distance values for corrupted ones

Valid sample Negative sample

SLIDE 33

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

33

Complex Relational Patterns

Symmetric Relations:

𝑠 ℎ, 𝑢 ⇒ 𝑠 𝑢, ℎ ∀ℎ, 𝑢

– Example: Family, Roommate

Composition Relations:

𝑠

𝑦, 𝑧 ∧ 𝑠. 𝑧, 𝑨 ⇒ 𝑠/ 𝑦, 𝑨 ∀𝑦, 𝑧, 𝑨

– Example: My mother’s husband is my father.

1-to-N, N-to-1 relations:

𝑠 ℎ, 𝑢- , 𝑠 ℎ, 𝑢. , … , 𝑠(ℎ, 𝑢‡) are all True.

– Example: 𝑠 is “StudentsOf”

SLIDE 34

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

34

Composition in TransE

Composition Relations:

𝑠

𝑦, 𝑧 ∧ 𝑠. 𝑧, 𝑨 ⇒ 𝑠/ 𝑦, 𝑨 ∀𝑦, 𝑧, 𝑨

– Example: My mother’s husband is my father.

In TransE, compositional relations are

possible if r3 = r1 + r2

𝐲 𝐬- 𝐬. 𝐬/ 𝐳 𝐴

SLIDE 35

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

35

Symmetric Relations in TransE

Symmetric Relations: 𝑠 ℎ, 𝑢 ⇒ 𝑠 𝑢, ℎ ∀ℎ, 𝑢

– Example: Family, Roommate

In TransE, symmetric relations are not

possible:

– For TransE to handle symmetric relations 𝑠, for all ℎ, 𝑢 that satisfy 𝑠(ℎ, 𝑢), 𝑠(𝑢, ℎ) is also True. – So, ℎ + 𝑠 − 𝑢 = 0 and 𝑢 + 𝑠 − ℎ = 0. – Then 𝑠 = 0 and ℎ = 𝑢. – However ℎ and 𝑢 are two different entities and should be mapped to different locations.

SLIDE 36

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

36

Limitation: N-ary Relations

1-to-N, N-to-1, N-to-N relations

– Example: (ℎ, 𝑠, 𝑢-) and (ℎ, 𝑠, 𝑢.) both exist in the knowledge graph, e.g., 𝑠 is “StudentsOf”

In TransE, 𝑢- and 𝑢. will map to the same

vector, although they are different entities.

– 𝐮- = 𝐢 + 𝐬 = 𝐮. – 𝐮- ≠ 𝐮.

In TransE, N-ary

relations are not possible

𝐢 𝐮- 𝐮. 𝐬 𝐬

contradictory!

SLIDE 37

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

37

Today’s Lecture

Embedding entire graphs
Introduction to Knowledge Graphs
Embeddings in Knowledge Graphs

– TransE – TransR

SLIDE 38

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

38

Solution: TransR

Learn embeddings for entities and relations

in separate spaces

– Model entities as vectors in the entity space ℝv – Model a relation as vector 𝒔 in relation space ℝo

Learn a relation-specific transformation

from the entity-to-relation space per relation

– Train 𝐍w ∈ ℝo×v as the projection matrix for vector 𝒔

Reference: “Learning entity and relation

embeddings for knowledge graph completion.” Lin et al. AAAI 2015.

SLIDE 39

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

39

TransR Formulation

ℎw = 𝑁wℎ, 𝑢w = 𝑁w𝑢
𝑔

w ℎ, 𝑢 =

ℎw + 𝑠 − 𝑢w

– instead of 𝑔

w ℎ, 𝑢 = ℎ + 𝑠 − 𝑢

SLIDE 40

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

40

Symmetric Relations in TransR

Symmetric Relations: 𝑠 ℎ, 𝑢 ⇒ 𝑠 𝑢, ℎ ∀ℎ, 𝑢

– Example: Family, Roommate

For TransR, we can learn Mr to map ℎ and 𝑢 to

the same location on the space of relation 𝑠 𝑠 = 0, ℎw = 𝑁wℎ = 𝑁w𝑢 = 𝑢‘ü

𝐢 𝐮w, ℎw 𝐮 𝑵w

SLIDE 41

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

41

N-ary Relations in TransR

1-to-N, N-to-1, N-to-N relations

– Example: If (ℎ, 𝑠, 𝑢-) and (ℎ, 𝑠, 𝑢.) exist in the knowledge

graph.

We can learn 𝑁w so that 𝑢w = 𝑁w𝑢- = 𝑁w𝑢.,

even though 𝑢- does not need to be equal to 𝑢.!

𝐢 𝐢w 𝐮w 𝐮- 𝐮. 𝐬

SLIDE 42

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

42

Limitation: Composition in TransR

Composition Relations:

𝑠

𝑦, 𝑧 ∧ 𝑠. 𝑧, 𝑨 ⇒ 𝑠/ 𝑦, 𝑨 ∀𝑦, 𝑧, 𝑨

– Example: My mother’s husband is my father.

Each relation has different space.
TransR is not naturally compositional for

multiple relations! û

SLIDE 43

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

43

Translation-Based KG Embedding

Embedding Entity Relation 𝒈𝒔(𝒊, 𝒖) TransE ℎ, 𝑢 ∈ ℝv 𝑠 ∈ ℝv ||ℎ + 𝑠 − 𝑢|| TransR ℎ, 𝑢 ∈ ℝv 𝑠 ∈ ℝo, 𝑁w ∈ ℝo×v ||𝑁wℎ + 𝑠 − 𝑁w𝑢||

Embedding Symmetry Composition One-to- many TransE

û ü û

TransR

ü û ü

SLIDE 44

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

44

Today’s Lecture

Embedding entire graphs
Introduction to Knowledge Graphs
Embeddings in Knowledge Graphs