Predicting Semantic Relations using Global Graph Properties Yuval - - PowerPoint PPT Presentation

predicting semantic relations using global graph
SMART_READER_LITE
LIVE PREVIEW

Predicting Semantic Relations using Global Graph Properties Yuval - - PowerPoint PPT Presentation

Predicting Semantic Relations using Global Graph Properties Yuval Pinter and Jacob Eisenstein @yuvalpi @jacobeisenstein code: github.com/yuvalpinter/m3gm contact: uvp@gatech.edu Semantic Graphs WordNet -like resources are curated to


slide-1
SLIDE 1

Predicting Semantic Relations using Global Graph Properties

Yuval Pinter and Jacob Eisenstein @yuvalpi

@jacobeisenstein

code: github.com/yuvalpinter/m3gm contact: uvp@gatech.edu

slide-2
SLIDE 2

Semantic Graphs

  • WordNet-like resources are curated to

describe relations between word senses

  • The graph is directed

○ Edges have form <S, r, T>: <zebra, is-a, equine> ○ Still, some relations are symmetric

  • Relation types include:

○ Hypernym (is-a) <zebra, r, equine> ○ Meronym (is-part-of) <tree, r, forest> ○ Is-instance-of <rome, r, capital> ○ Derivational Relatedness <nice, r, nicely>

3 mammal canine equine horse zebra wolf fenec

slide-3
SLIDE 3

Semantic Graphs - Relation Prediction

  • The task of predicting relations (zebra is a <BLANK>)
  • Local models use embeddings-based composition for

scoring edges

4

equine zebra hypernym

slide-4
SLIDE 4

s = - (|| + - ||)

Semantic Graphs - Relation Prediction

  • The task of predicting relations (zebra is a <BLANK>)
  • Local models use embeddings-based composition for

scoring edges

5

equine zebra hypernym Translational Embeddings (transE) [Bordes et al. 2013]

slide-5
SLIDE 5

s = * *

Semantic Graphs - Relation Prediction

  • The task of predicting relations (zebra is a <BLANK>)
  • Local models use embeddings-based composition for

scoring edges

6

equine zebra hypernym Full-Bilinear (Bilin) [Nickel et al. 2011]

slide-6
SLIDE 6

Semantic Graphs - Relation Prediction

  • The task of predicting relations (zebra is a <BLANK>)
  • Local models use embeddings-based composition for

scoring edges

  • Problem: task-driven method can learn unreasonable

graphs

7 mammal equine horse zebra canine equine zebra

slide-7
SLIDE 7

Incorporating a Global View

  • We want to avoid unreasonable graphs
  • Imposing hard constraints isn’t flexible enough

○ Only takes care of impossible graphs ○ Requires domain knowledge

  • We still want the local signal to matter - it’s very strong.

8

slide-8
SLIDE 8

Incorporating a Global View

  • We want to avoid unreasonable graphs
  • Imposing hard constraints isn’t flexible enough

○ Only takes care of impossible graphs ○ Requires domain knowledge

  • We still want the local signal to matter - it’s very strong.
  • Our solution: an additive, learnable global graph score

Score(<zebra, hypernym, equine>| WordNet) = slocal(edge) + 𝚬(sglobal(WN + edge), sglobal(WN))

9

slide-9
SLIDE 9

Global Graph Score

  • Based on a framework called Exponential Random Graph Model (ERGM)
  • The score sglobal(WN) is derived from a log-linear distribution across possible

graphs that have a fixed number n of nodes

pERGM(WN) ∝ exp(𝝸T · 𝚾(WN))

Weights vector Graph features

10

slide-10
SLIDE 10

Global Graph Score

  • Based on a framework called Exponential Random Graph Model (ERGM)
  • The score sglobal(WN) is derived from a log-linear distribution across possible

graphs that have a fixed number n of nodes

  • OK. What are the features?

pERGM(WN) ∝ exp(𝝸T · 𝚾(WN))

Weights vector Graph features

11

slide-11
SLIDE 11

Graph Features (Motifs)

  • #edges: 6
  • #targets: 4
  • #3-cycles: 0
  • #2-paths: 4
  • Transitivity: ¼ = 0.25

12

1 4 2 5 6 3

slide-12
SLIDE 12

Graph Features (Motifs)

  • #edges: 6
  • #targets: 4
  • #3-cycles: 0
  • #2-paths: 4
  • Transitivity: ¼ = 0.25

13

1 4 2 5 6 3

slide-13
SLIDE 13

Graph Features (Motifs)

  • #edges: 6
  • #targets: 4
  • #3-cycles: 0
  • #2-paths: 4
  • Transitivity: ¼ = 0.25

14

1 4 2 5 6 3

slide-14
SLIDE 14

Graph Features (Motifs)

  • #edges: 6
  • #targets: 4
  • #3-cycles: 0
  • #2-paths: 4
  • Transitivity: ¼ = 0.25

15

1 4 2 5 6 3

slide-15
SLIDE 15

Graph Features (Motifs)

  • #edges: 6
  • #targets: 4
  • #3-cycles: 0
  • #2-paths: 4
  • Transitivity: ¼ = 0.25

16

1 4 2 5 6 3

slide-16
SLIDE 16

Graph Motifs (multiple relations)

  • #edges: 6
  • #targets: 4
  • #3-cycles: 0
  • #2-paths: 4
  • Transitivity: ¼ = 0.25

17

1 4 2 5 6 3 (some) joint blue/orange motifs:

  • #edges {b, o}: 9
  • #2-cycles {b, o}: 1
  • #3-cycles (b-o-o): 1
  • #3-cycles (b-b-o): 0
  • #2-paths (b-b): 4
  • #2-paths (b-o): 3
  • #2-paths (o-b): 4
  • Transitivity (b-o-b): ⅔ = 0.67
slide-17
SLIDE 17

Graph Motifs (multiple relations)

  • #edges: 6
  • #targets: 4
  • #3-cycles: 0
  • #2-paths: 4
  • Transitivity: ¼ = 0.25

18

1 4 2 5 6 3 (some) joint blue/orange motifs:

  • #edges {b, o}: 9
  • #2-cycles {b, o}: 1
  • #3-cycles (b-o-o): 1
  • #3-cycles (b-b-o): 0
  • #2-paths (b-b): 4
  • #2-paths (b-o): 3
  • #2-paths (o-b): 4
  • Transitivity (b-o-b): ⅔ = 0.67
slide-18
SLIDE 18

Graph Motifs (multiple relations)

  • #edges: 6
  • #targets: 4
  • #3-cycles: 0
  • #2-paths: 4
  • Transitivity: ¼ = 0.25

19

(some) joint blue/orange motifs:

  • #edges {b, o}: 9
  • #2-cycles {b, o}: 1
  • #3-cycles (b-o-o): 1
  • #3-cycles (b-b-o): 0

1 4 2 5 6 3

  • #2-paths (b-b): 4
  • #2-paths (b-o): 3
  • #2-paths (o-b): 4
  • Transitivity (b-o-b): ⅔ = 0.67
slide-19
SLIDE 19

Graph Motifs (multiple relations)

  • #edges: 6
  • #targets: 4
  • #3-cycles: 0
  • #2-paths: 4
  • Transitivity: ¼ = 0.25

20

1 4 2 5 6 3 (some) joint blue/orange motifs:

  • #edges {b, o}: 9
  • #2-cycles {b, o}: 1
  • #3-cycles (b-o-o): 1
  • #3-cycles (b-b-o): 0
  • #2-paths (b-b): 4
  • #2-paths (b-o): 3
  • #2-paths (o-b): 4
  • Transitivity (b-o-b): ⅔ = 0.67
slide-20
SLIDE 20

ERGM Training

  • Estimating the scores for all possible graphs to obtain a probability distribution

is implausible

○ Number of possible directed graphs with n nodes: O(exp(n2)) ○ n nodes, R relations: O(exp(R*n2)) ○ Estimation begins to be hard at ~n=100 for R=1. In WordNet: n = 40K, R = 11.

21

slide-21
SLIDE 21

ERGM Training

  • Estimating the scores for all possible graphs to obtain a probability distribution

is implausible

○ Number of possible directed graphs with n nodes: O(exp(n2)) ○ n nodes, R relations: O(exp(R*n2)) ○ Estimation begins to be hard at ~n=100 for R=1. In WordNet: n = 40K, R = 11.

  • Unlike other structured problems, there’s no known dynamic programming

algorithm either

22

slide-22
SLIDE 22

ERGM Training

  • Estimating the scores for all possible graphs to obtain a probability distribution

is implausible

○ Number of possible directed graphs with n nodes: O(exp(n2)) ○ n nodes, R relations: O(exp(R*n2)) ○ Estimation begins to be hard at ~n=100 for R=1. In WordNet: n = 40K, R = 11.

  • Unlike other structured problems, there’s no known dynamic programming

algorithm either What can we do?

  • Decompose score over dyads (node pairs) in graph
  • Draw and score negative sample graphs

23

slide-23
SLIDE 23

Max-Margin Markov Graph Model (M3GM)

  • Sample negative graphs from the “local

neighborhood” of the true WN

24

slide-24
SLIDE 24

Max-Margin Markov Graph Model (M3GM)

  • Sample negative graphs from the “local

neighborhood” of the true WN

25

slide-25
SLIDE 25

Max-Margin Markov Graph Model (M3GM)

  • Sample negative graphs from the “local

neighborhood” of the true WN

26

slide-26
SLIDE 26

Max-Margin Markov Graph Model (M3GM)

  • Sample negative graphs from the “local

neighborhood” of the true WN

27

slide-27
SLIDE 27

Max-Margin Markov Graph Model (M3GM)

  • Sample negative graphs from the “local

neighborhood” of the true WN

  • Loss = Max {0, 1 + score(negative sample)
  • score(WN)}

28

slide-28
SLIDE 28

Max-Margin Markov Graph Model (M3GM)

  • It’s important to choose an appropriate

proposal distribution (source of the negative samples)

s v t v v v

29

slide-29
SLIDE 29

Max-Margin Markov Graph Model (M3GM)

  • It’s important to choose an appropriate

proposal distribution (source of the negative samples)

  • We want to make things hard for the scorer

Q(v|s, r) ∝ slocal(<s, r, v>)

s v t v v v

30

slide-30
SLIDE 30

Evaluation

  • Dataset - WN18RR

○ No reciprocal relations (hypernym ⇔ hyponym) ○ Still includes symmetric relations

  • Metrics - MRR, H@10
  • Rule baseline - take symmetric if exists in train

○ Used in all models as default for symmetric relations

  • Local models

○ Synset embeddings - averaged from FastText

  • M3GM (re-rank top 100 from local)

○ ~ 3000 motifs, ~900 non-zero

31

slide-31
SLIDE 31

Evaluation

  • Dataset - WN18RR

○ No reciprocal relations (hypernym ⇔ hyponym) ○ Still includes symmetric relations

  • Metrics - MRR, H@10
  • Rule baseline - take symmetric if exists in train

○ Used in all models as default for symmetric relations

  • Local models

○ Synset embeddings - averaged from FastText

  • M3GM (re-rank top 100 from local)

○ ~ 3000 motifs, ~900 non-zero

32

transE DistMult Bilin

slide-32
SLIDE 32

Evaluation

  • Dataset - WN18RR

○ No reciprocal relations (hypernym ⇔ hyponym) ○ Still includes symmetric relations

  • Metrics - MRR, H@10
  • Rule baseline - take symmetric if exists in train

○ Used in all models as default for symmetric relations

  • Local models

○ Synset embeddings - averaged from FastText

  • M3GM (re-rank top 100 from local)

○ ~ 3000 motifs, ~900 non-zero

33

transE

slide-33
SLIDE 33

34 [Dettmers et al. 2018] [Nguyen et al. 2018] [Bordes et al. 2013] [Trouillon et al. 2016]

slide-34
SLIDE 34

Feature Analysis

  • Motifs with heavy positive weights:

○ Targets of has_part ○ Two-paths hypernym → derivationally_related_form

  • Motifs with heavy negative weights:

○ Targets of hypernym ○ Two-cycles of hypernym ○ Target of both has_part and verb_group

35

slide-35
SLIDE 35

Feature Analysis

  • Motifs with heavy positive weights:

○ Targets of has_part ○ Two-paths hypernym → derivationally_related_form

  • Motifs with heavy negative weights:

○ Targets of hypernym ○ Two-cycles of hypernym ○ Target of both has_part and verb_group

vienna france austria european union germany ...

Seen in training data Local-only prediction M3GM prediction Unseen in data

36

slide-36
SLIDE 36

Feature Analysis

  • Motifs with heavy positive weights:

○ Targets of has_part ○ Two-paths hypernym → derivationally_related_form

  • Motifs with heavy negative weights:

○ Targets of hypernym ○ Two-cycles of hypernym ○ Target of both has_part and verb_group

indian lettuce lettuce herb garden lettuce ... ... ... ...

Seen in training data Local-only prediction M3GM prediction

37

slide-37
SLIDE 37
  • Motifs with heavy positive weights:

○ Targets of has_part ○ Two-paths hypernym → derivationally_related_form

  • Motifs with heavy negative weights:

○ Targets of hypernym ○ Two-cycles of hypernym ○ Target of both has_part and verb_group

Feature Analysis

38 mammal equine

slide-38
SLIDE 38
  • Motifs with heavy positive weights:

○ Targets of has_part ○ Two-paths hypernym → derivationally_related_form

  • Motifs with heavy negative weights:

○ Targets of hypernym ○ Two-cycles of hypernym ○ Target of both has_part and verb_group

Feature Analysis

Hypernym

  • Deriv. Related form

39

“Derivations occur in the abstract parts of the graph”

(bodega / canteen vs. shop)

slide-39
SLIDE 39

Feature Analysis

  • Motifs with heavy positive weights:

○ Targets of has_part ○ Two-paths hypernym → derivationally_related_form

  • Motifs with heavy negative weights:

○ Targets of hypernym ○ Two-cycles of hypernym ○ Target of both has_part and verb_group Nouns Verbs

40

slide-40
SLIDE 40
  • Multilingual transfers of semantic graphs

קנוי יבלכיסוס סוסהרבזבאזקנפ

Future Work

41 mammal canine equine horse zebra wolf fenec

slide-41
SLIDE 41
  • Multilingual transfers of semantic graphs align embeddings / translate concepts

קנוי יבלכיסוס סוסהרבזבאזקנפ

Future Work

42 mammal canine equine horse zebra wolf fenec

slide-42
SLIDE 42
  • Multilingual transfers of semantic graphs align embeddings / translate concepts
  • Can we introduce global features to help?

קנוי יבלכיסוס סוסהרבזבאזקנפ

Future Work

43 mammal canine equine horse zebra wolf fenec

slide-43
SLIDE 43

Conclusion

  • Global reasoning of graph features is beneficial for relation prediction
  • Works well on top of strong local models
  • Applicable to large graphs with dozens of relation types ← M3GM
  • Orthogonal of word / synset embedding techniques
  • Finds a wide variety of linguistic patterns in semantic graphs

44

slide-44
SLIDE 44
  • Computational Linguistics lab

@Georgia Tech

45

Thanks

code + bonus WordNet analysis tools: github.com/yuvalpinter/m3gm contact: uvp@gatech.edu

slide-45
SLIDE 45
  • Computational Linguistics lab

@Georgia Tech

  • Bloomberg Data Science PhD.

Fellowship Program

46

Thanks

code + bonus WordNet analysis tools: github.com/yuvalpinter/m3gm contact: uvp@gatech.edu

slide-46
SLIDE 46
  • Computational Linguistics lab

@Georgia Tech

  • Bloomberg Data Science PhD.

Fellowship Program

  • YOU!

47

Thanks

code + bonus WordNet analysis tools: github.com/yuvalpinter/m3gm contact: uvp@gatech.edu