Extracting and Modeling Relations with Graph Convolutional Networks - - PowerPoint PPT Presentation

extracting and modeling relations with graph
SMART_READER_LITE
LIVE PREVIEW

Extracting and Modeling Relations with Graph Convolutional Networks - - PowerPoint PPT Presentation

Extracting and Modeling Relations with Graph Convolutional Networks Ivan Titov with Diego Marcheggiani, Michael Schlichtkrull, Thomas Kipf, Max Welling, Rianne van den Berg and Peter Bloem 1 Inferring missing facts in knowledge bases: link


slide-1
SLIDE 1

Extracting and Modeling Relations with Graph Convolutional Networks

Ivan Titov

1

with Diego Marcheggiani, Michael Schlichtkrull, Thomas Kipf, Max Welling, Rianne van den Berg and Peter Bloem

slide-2
SLIDE 2

Inferring missing facts in knowledge bases: link prediction

Vaganova Academy

studied_at l i v e d _ i n ?

Mikhail Baryshnikov

  • St. Petersburg

located_in

slide-3
SLIDE 3

Relation Extraction

Vaganova Academy

studied_at l i v e d _ i n

Mikhail Baryshnikov Mariinsky Theatre

danced_for ?

  • St. Petersburg

located_in

Baryshnikov danced for Mariinsky based in what was then Leningrad (now St. Petersburg)

danced_for

slide-4
SLIDE 4

Generalization of link prediction and relation extraction

Vaganova Academy

studied_at l i v e d _ i n

Mikhail Baryshnikov Mariinsky Theatre

danced_for ?

  • St. Petersburg

located_in E.g., Universal Schema (Reidel et al., 2013)

After a promising start in Mariinsky ballet, Baryshnikov defected to Canada in 1974 ...

slide-5
SLIDE 5

KBC: it is natural to represent both sentences and KB with graphs

Vaganova Academy

studied_at l i v e d _ i n

After a promising start in Mariinsky ballet, Baryshnikov defected to Canada in 1974 ...

Mikhail Baryshnikov Mariinsky Theatre

danced_for ?

  • St. Petersburg

located_in

For sentences, the graphs encode beliefs about their linguistic structure How can we model (and exploit) these graphs with graph neural networks?

slide-6
SLIDE 6

Outline Graph Convolutional Networks (GCNs) Extracting Semantic Relations: Semantic Role Labeling Link Prediction with Graph Neural Networks Syntactic GCNs Semantic Role Labeling Model Relational GCNs Denoising Graph Autoencoders for Link Prediction

slide-7
SLIDE 7

Graph Convolutional Networks: Neural Message Passing

slide-8
SLIDE 8

Graph Convolutional Networks: message passing

v

Undirected graph Update for node v

Kipf & Welling (2017). Related ideas earlier, e.g., Scarselli et al. (2009).

slide-9
SLIDE 9

Graph Convolutional Networks: message passing

v

Undirected graph Update for node v

Kipf & Welling (2017). Related ideas earlier, e.g., Scarselli et al. (2009).

slide-10
SLIDE 10

GCNs: multilayer convolution operation

Representations informed by node neighbourhoods

X = H(0) Z = H(N) H(1) Input Hidden layer H(2) Hidden layer Output

Initial feature representations

  • f nodes

Parallelizable computation, can be made quite efficient (e.g., Hamilton, Ying and Leskovec (2017)).

slide-11
SLIDE 11

GCNs: multilayer convolution operation

X = H(0) Z = H(N) H(1) Input Hidden layer H(2) Hidden layer Output Parallelizable computation, can be made quite efficient (e.g., Hamilton, Ying and Leskovec (2017)).

Representations informed by node neighbourhoods Initial feature representations

  • f nodes
slide-12
SLIDE 12

Graph Convolutional Networks: Previous work Shown very effective on a range of problems - citations graphs, chemistry, ... Mostly:

  • Unlabeled and undirected graphs
  • Node labeling in a single large graph (transductive setting)
  • Classification of graphlets

How to apply GCNs to graphs we have in knowledge based completion / construction?

See Bronstein et al. (Signal Processing, 2017) for an overview

slide-13
SLIDE 13

Link Prediction with Graph Neural Networks

slide-14
SLIDE 14

Link Prediction

Vaganova Academy

studied_at l i v e d _ i n ?

Mikhail Baryshnikov Mariinsky Theatre

danced_for

  • St. Petersburg

located_in located_in

slide-15
SLIDE 15

Link Prediction

Vaganova Academy

studied_at l i v e d _ i n ?

Mikhail Baryshnikov Mariinsky Theatre

danced_for

  • St. Petersburg

located_in located_in

slide-16
SLIDE 16

Link Prediction

Vaganova Academy

studied_at l i v e d _ i n ?

Mikhail Baryshnikov Mariinsky Theatre

danced_for

  • St. Petersburg

located_in located_in

slide-17
SLIDE 17

KB Factorization

Vaganova Academy

studied_at l i v e d _ i n ?

Mikhail Baryshnikov Mariinsky Theatre

danced_for

  • St. Petersburg

located_in located_in

slide-18
SLIDE 18

KB Factorization

Vaganova Academy

studied_at l i v e d _ i n ?

Mikhail Baryshnikov Mariinsky Theatre

danced_for

  • St. Petersburg

located_in

X X

Baryshnikov

lived_in

  • St. Petersburg

A scoring function is used to predict whether a relation holds:

RESCAL (Nickel et al., 2011)

located_in

slide-19
SLIDE 19

KB Factorization

Vaganova Academy

studied_at l i v e d _ i n ?

Mikhail Baryshnikov Mariinsky Theatre

danced_for

  • St. Petersburg

located_in

X X

Baryshnikov

lived_in

  • St. Petersburg

A scoring function is used to predict whether a relation holds:

DistMult (Yang et al., 2014)

located_in

slide-20
SLIDE 20

KB Factorization

Vaganova Academy

studied_at l i v e d _ i n ?

Mikhail Baryshnikov Mariinsky Theatre

danced_for

  • St. Petersburg

located_in

X X

Baryshnikov

lived_in

  • St. Petersburg

A scoring function is used to predict whether a relation holds:

DistMult (Yang et al., 2014)

located_in

Relies on SGD to propagate information across the graph

slide-21
SLIDE 21

Relational GCNs

Vaganova Academy

studied_at l i v e d _ i n ?

Mikhail Baryshnikov Mariinsky Theatre

danced_for

  • St. Petersburg

located_in

X X

Baryshnikov

lived_in

  • St. Petersburg

A scoring function is used to predict whether a relation holds:

DistMult (Yang et al., 2014)

Use the same scoring function but with GCN node representations rather than parameter vectors

located_in Schlichtkrull et al., 2017

slide-22
SLIDE 22

Relational GCNs

Vaganova Academy

studied_at l i v e d _ i n ?

Mikhail Baryshnikov Mariinsky Theatre

danced_for

  • St. Petersburg

located_in located_in

X X

Baryshnikov

lived_in

  • St. Petersburg

A scoring function is used to predict whether a relation holds:

DistMult (Yang et al., 2014)

Use the same scoring function but with GCN node representations rather than parameter vectors

Info about

  • St. Petersburg

reached here Schlichtkrull et al., 2017

slide-23
SLIDE 23

Relational GCNs

Vaganova Academy

studied_at l i v e d _ i n ?

Mikhail Baryshnikov Mariinsky Theatre

danced_for

  • St. Petersburg

located_in located_in

X X

Baryshnikov

lived_in

  • St. Petersburg

A scoring function is used to predict whether a relation holds:

DistMult (Yang et al., 2014)

Use the same scoring function but with GCN node representations rather than parameter vectors

Schlichtkrull et al., 2017 Info about

  • St. Petersburg

reached here

slide-24
SLIDE 24

Relational GCNs

slide-25
SLIDE 25

Relational GCNs

How do we train Relational GCNs? How do we compactly parameterize Relational GCNs?

slide-26
SLIDE 26

GCN Denoising Autoencoders

citizen_of Mikhail Baryshnikov U.S.A. Mariinsky Theatre danced_for Vaganova Academy awarded studied_at Vilcek Prize

  • St. Petersburg

located_in l

  • c

a t e d _ i n lived_in

Take the training graph

Schlichtkrull et al (2017)

slide-27
SLIDE 27

GCN Denoising Autoencoders

Mikhail Baryshnikov U.S.A. Mariinsky Theatre danced_for Vaganova Academy awarded studied_at Vilcek Prize

  • St. Petersburg

located_in l

  • c

a t e d _ i n

Produce a noisy version: drop some random edges Use this graph for encoding nodes with GCNs

Schlichtkrull et al., 2017

slide-28
SLIDE 28

GCN Denoising Autoencoders

citizen_of Mikhail Baryshnikov U.S.A. Mariinsky Theatre danced_for Vaganova Academy awarded studied_at Vilcek Prize

  • St. Petersburg

located_in l

  • c

a t e d _ i n lived_in

Force the model to reconstruct the original graph (including dropped edges)

(a ranking loss on edges) Schlichtkrull et al., 2017

slide-29
SLIDE 29

Training Encoder Decoder Classic DistMult Our R-GCN (e.g., node embeddings) (e.g., node embeddings)

Schlichtkrull et al., 2017

slide-30
SLIDE 30

Instead of denoising AEs, we can use variational AEs to train R-GCNs VAE R-GCN can be regarded as an inference network performing amortized variational inference Intuition: R-GCN AEs are amortized versions of factorization models GCN Autoencoders: Denoising vs Variational

slide-31
SLIDE 31

Relational GCN

v

There are too many relations in realistic KBs, we cannot use full rank matrices

Schlichtkrull et al., 2017

slide-32
SLIDE 32

Relational GCN Naive logic: We score with a diagonal matrix (DistMul), let’s use a diagonal one in GCN

slide-33
SLIDE 33

Relational GCN Block diagonal assumption: Latent features can be grouped into sets of tightly inter-related features, modeling dependencies across the sets is less important

slide-34
SLIDE 34

Relational GCN Basis / Dictionary learning: Represent every KB relation as a linear combination of basis transformations basis transformations coefficients

slide-35
SLIDE 35

Results on FB15k-237 (hits@10) Our R-GCN relies on DistMult in the decoder: DistMult is its natural baseline

See other results and metrics in the paper. Results for ComplEX, TransE and HolE from code

  • f Trouillon et al. (2016). Results for HolE using

code by Nickel et al. (2015)

Our model DistMult baseline

slide-36
SLIDE 36

Relational GCNs Fast and simple approach to Link Prediction Captures multiple paths without the need to explicitly marginalize over them Unlike factorizations, can be applied to subgraphs unseen in training FUTURE WORK: R-GCNs can be used in combination with more powerful factorizations / decoders Objectives favouring recovery of paths rather than edges Gates and memory may be effective

slide-37
SLIDE 37

Extracting Semantic Relations

slide-38
SLIDE 38

Semantic Role Labeling Closely related to the relation extraction task Discovering the predicate-argument structure of a sentence Sequa makes and repairs jet engines

slide-39
SLIDE 39

Semantic Role Labeling Closely related to the relation extraction task Discovering the predicate-argument structure of a sentence

  • Discover predicates

Sequa makes and repairs jet engines

slide-40
SLIDE 40

Semantic Role Labeling Closely related to the relation extraction task Discovering the predicate-argument structure of a sentence

  • Discover predicates
  • Identify arguments and label them with their semantic roles

Sequa makes and repairs jet

creator

engines

creation repairer entity repaired

slide-41
SLIDE 41

Syntax/semantics interaction Sequa makes and repairs jet

subj creator

engines

coord

  • bj

conj nmod creation

Some syntactic dependencies are mirrored in the semantic graph

slide-42
SLIDE 42

Syntax/semantics interaction Sequa makes and repairs jet

subj creator

engines

coord

  • bj

conj nmod creation

Some syntactic dependencies are mirrored in the semantic graph … but not all of them – the syntax-semantics interface is far from trivial

repairer entity repaired

GCNs provide a flexible framework for capturing interactions between the graphs

slide-43
SLIDE 43

Syntactic GCNs: directionality and labels

v

Win Wout amod

  • bj

advmod advmod Wloop

Along syntactic edges Direction

  • pposite of

syntactic edges

slide-44
SLIDE 44

Syntactic GCNs: directionality and labels

v

Win Wout amod

  • bj

advmod advmod Wloop

Direction

  • pposite of

syntactic edges Along syntactic edges Weight matrix for each direction: Wout, Win, Wloop Bias for each label + direction, e.g. bin-subj

slide-45
SLIDE 45

Syntactic GCNs: edge-wise gating

The gate weights the message

Not all edges are equally informative for the downstream task or reliable

We use parsers to predict syntax

Marcheggiani et al., EMNLP 2017

slide-46
SLIDE 46

Graph Convolutional Encoders

Sequa makes and repairs jet

subj coord

  • bj

conj

engines

nmod

Encoder (BiRNN, CNN, ..)

slide-47
SLIDE 47

Graph Convolutional Encoders

Sequa makes and repairs jet

subj coord

  • bj

conj

engines

nmod

Encoder (BiRNN, CNN, ..) GCN layer 1

slide-48
SLIDE 48

Graph Convolutional Encoders

Sequa makes and repairs jet

subj coord

  • bj

conj

W

  • u

t

W

  • u

t

Wout

engines

nmod

Wout Wout

Encoder (BiRNN, CNN, ..) GCN layer 1

slide-49
SLIDE 49

Graph Convolutional Encoders

Sequa makes and repairs jet subj coord

  • bj

conj engines nmod

W

  • u

t

W

  • u

t

Wout Wout Wout

Encoder (BiRNN, CNN, ..) GCN layer 1

slide-50
SLIDE 50

Graph Convolutional Encoders

Sequa makes and repairs jet

subj coord

  • bj

conj

engines

nmod

W

  • u

t

W

  • u

t

Wout Wout Wout

Encoder (BiRNN, CNN, ..) GCN layer 2 GCN layer 1 GCN layer 3

slide-51
SLIDE 51

Graph Convolutional Encoders

Sequa makes and repairs jet

subj coord

  • bj

conj

Encoder (BiRNN, CNN, ..) GCN layer 2 GCN layer 1

engines

nmod

GCN layer 3

W

  • u

t

W

  • u

t

Wout Wout Wout

slide-52
SLIDE 52

Graph Convolutional Encoders

How do we construct a GCN-based semantic role labeler?

slide-53
SLIDE 53

GCNs for Semantic Role Labeling BiRNN Repairer Semantic Role Labeler GCN layer(s)

Sequa makes and repairs jet

subj coord

  • bj

conj

Wout Wout Wout

engines

nmod

Wout Wout

Marcheggiani et al., EMNLP 2017

slide-54
SLIDE 54

GCNs for Semantic Role Labeling NULL

Sequa makes and repairs jet

subj coord

  • bj

conj

Wout Wout Wout

engines

nmod

Wout Wout

Marcheggiani et al., EMNLP 2017

BiRNN Semantic Role Labeler GCN layer(s)

slide-55
SLIDE 55

GCNs for Semantic Role Labeling NULL

Sequa makes and repairs jet

subj coord

  • bj

conj

Wout Wout Wout

engines

nmod

Wout Wout

Marcheggiani et al., EMNLP 2017

BiRNN Semantic Role Labeler GCN layer(s)

slide-56
SLIDE 56

GCNs for Semantic Role Labeling NULL

Sequa makes and repairs jet

subj coord

  • bj

conj

Wout Wout Wout

engines

nmod

Wout Wout

Marcheggiani et al., EMNLP 2017

BiRNN Semantic Role Labeler GCN layer(s)

slide-57
SLIDE 57

GCNs for Semantic Role Labeling NULL

Sequa makes and repairs jet

subj coord

  • bj

conj

Wout Wout Wout

engines

nmod

Wout Wout

Marcheggiani et al., EMNLP 2017

BiRNN Semantic Role Labeler GCN layer(s)

slide-58
SLIDE 58

GCNs for Semantic Role Labeling Entity Repaired

Sequa makes and repairs jet

subj coord

  • bj

conj

Wout Wout Wout

engines

nmod

Wout Wout

Marcheggiani et al., EMNLP 2017

BiRNN Semantic Role Labeler GCN layer(s)

slide-59
SLIDE 59

Results (F1) on Chinese (CoNLL-2009, dev set)

Marcheggiani & Titov (EMNLP, 2017) Predicate disambiguation is excluded from the F1 metric

slide-60
SLIDE 60

Results (F1) on Chinese (CoNLL-2009, test set)

Marcheggiani & Titov (EMNLP, 2017)

slide-61
SLIDE 61

Results (F1) on English (CoNLL-2009)

Marcheggiani & Titov (EMNLP, 2017)

Ensemble of 3 Our model

Single models Ensembles

slide-62
SLIDE 62

Flexibility of GCN encoders Simple and fast approach to integrating linguistic structure into encoders In principle we can exploit almost any kind of linguistic structure: Semantic role labeling structure Co-reference chains AMR semantic graphs Their combination

slide-63
SLIDE 63

Other applications of syntactic GCN encoders Bastings et al. (EMNLP, 2017) Others recently applied them to NER Cetoli et al. (arXiv:1709.10053) We also showed them effective as encoders in Neural Machine Translation

slide-64
SLIDE 64

Conclusions GCNs are in subtasks of KBC (and in NLP beyond KBC):

  • Semantic Roles: we proposed GCNs for encoding linguistic knowledge
  • Link prediction: GCNs for link prediction (and entity classification) in

multi-relational knowledge bases Code available We are hiring! (PhD students / postdocs)

slide-65
SLIDE 65

Analysis / Discussion

  • Improvement across the board, especially in the middle of the range
slide-66
SLIDE 66

Effect of Distance between Argument and Predicate (English)