Knowledge Graph Reasoning CSCI 699: ML4Know Instructor: Xiang Ren - - PowerPoint PPT Presentation

knowledge graph reasoning
SMART_READER_LITE
LIVE PREVIEW

Knowledge Graph Reasoning CSCI 699: ML4Know Instructor: Xiang Ren - - PowerPoint PPT Presentation

Knowledge Graph Reasoning CSCI 699: ML4Know Instructor: Xiang Ren USC Computer Science Overview Motivation Path-Based Reasoning Embedding-Based Reasoning Bridging Path-Based and Embedding-Based Reasoning: DeepPath & DIVA


slide-1
SLIDE 1

Knowledge Graph Reasoning

CSCI 699: ML4Know

Instructor: Xiang Ren USC Computer Science

slide-2
SLIDE 2

Overview

  • Motivation
  • Path-Based Reasoning
  • Embedding-Based Reasoning
  • Bridging Path-Based and Embedding-Based

Reasoning: DeepPath & DIVA

  • Conclusion

2

slide-3
SLIDE 3

Knowledge Graphs are Not Complete

3 Band of Brothers Mini- Series HBO tvProgramCreator tvProgramGenre Graham Yost writtenBy music United States countryOfOrigin Neal McDonough nationality-1 English Tom Hanks awardWorkWinner castActor

...

profession personLanguages personLanguages Caesars Entertain… serviceLocation-1 serviceLanguage Actor c

  • u

n t r y S p

  • k

e n I n

  • 1

Michael Kamen

slide-4
SLIDE 4

Benefits of Knowledge Graph

  • Support various applications
  • Structured Search
  • Question Answering
  • Dialogue Systems
  • Relation Extraction
  • Summarization

4

slide-5
SLIDE 5

Benefits of Knowledge Graph

  • Support various applications
  • Structured Search
  • Question Answering
  • Dialogue Systems
  • Relation Extraction
  • Summarization
  • Knowledge Graphs can be constructed via

information extraction from text, but…

  • There will be a lot of missing links.
  • Goal: complete the knowledge graph.

5

slide-6
SLIDE 6

Reasoning on Knowledge Graph

6

Query node: Band of brothers Query relation: tvProgramLanguage tvProgramLanguage(Band of Brothers, ?)

slide-7
SLIDE 7

Reasoning on Knowledge Graph

7 Band of Brothers Mini- Series HBO tvProgramCreator tvProgramGenre Graham Yost writtenBy music United States countryOfOrigin Neal McDonough nationality-1 English Tom Hanks awardWorkWinner castActor

...

profession personLanguages personLanguages Caesars Entertain… serviceLocation-1 serviceLanguage Actor c

  • u

n t r y S p

  • k

e n I n

  • 1

Michael Kamen

slide-8
SLIDE 8

KB Reasoning Tasks

  • Predicting the missing link.
  • Given e1 and e2, predict the relation r.
  • Predicting the missing entity.
  • Given e1 and relation r, predict the missing entity e2.
  • Fact Prediction.
  • Given a triple, predict whether it is true or false.

8

slide-9
SLIDE 9

Related Work

  • Path-based methods
  • Path-Ranking Algorithm, Lao et al. 2011
  • ProPPR, Wang et al, 2013
  • Subgraph Feature Extraction, Gardner et al, 2015
  • RNN + PRA, Neelakantan et al, 2015
  • Chains of Reasoning, Das et al, 2017

9

Why do we need path-based methods? It’s accurate and explainable!

slide-10
SLIDE 10

Random Walk Inference

10

slide-11
SLIDE 11

Path-Ranking Algorithm (Lao et al., 2011)

  • 1. Run random walk with restarts to derive many

paths.

  • 2. Use supervised training to rank different paths.

11

slide-12
SLIDE 12

Path-Ranking Algorithm (Lao et al., 2011)

  • 1. Run random walk with restarts to derive many

paths.

12

slide-13
SLIDE 13

Path-Ranking Algorithm (Lao et al., 2011)

  • 1. Run random walk with restarts to derive many

paths.

13

slide-14
SLIDE 14

Path-Ranking Algorithm (Lao et al., 2011)

  • 2. Use supervised training to rank different paths.

14

slide-15
SLIDE 15

Path-Ranking Algorithm (Lao et al., 2011)

  • 2. Use supervised training to rank different paths.

15

slide-16
SLIDE 16

ProPPR (Wang et al., 2013;2015)

  • ProPPR generalizes PRA with recursive

probabilistic logic programs.

  • You may use other relations to jointly infer this

target relation.

16

slide-17
SLIDE 17

Chain of Reasoning (Das et al, 2017)

17

  • 1. Use PRA to derive the path.
  • 2. Use RNNs to perform reasoning of the target relation.
slide-18
SLIDE 18

Related Work

  • Embedding-based method
  • RESCAL, Nickel et al, 2011
  • TransE, Bordes et al, 2013
  • Neural Tensor Network, Socher et al, 2013
  • TransR/CTransR, Lin et al, 2015
  • Complex Embeddings, Trouillon et al, 2016

18

Embedding methods allow us to compare, and find similar entities in the vector space.

slide-19
SLIDE 19

RESCAL (Nickel et al., 2011)

  • Tensor factorization on the
  • (head)entity-(tail)entity-relation tensor.

19

slide-20
SLIDE 20

TransE (Bordes et al., 2013)

20

  • Assumption: in the vector space, when adding the

relation to the head entity, we should get close to the target tail entity.

  • Margin based loss function:
  • Minimize the distance between (h+l) and t.
  • Maximize the distance between (h+l) to a randomly

sampled tail t’ (negative example).

slide-21
SLIDE 21

Neural Tensor Networks (Socher et al., 2013)

21

  • Model the bilinear interaction between entity pairs

with tensors.

slide-22
SLIDE 22

Poincaré Embeddings (Nickel and Kiela, 2017)

  • Idea: learn hierarchical KB representations by

looking at hyperbolic space.

22

slide-23
SLIDE 23

ConvE (Dettmers et al, 2018)

23

  • 1. Reshape the head and relation embeddings into “images”.
  • 2. Use CNNs to learn convolutional feature maps.
slide-24
SLIDE 24

Bridging Path-Based and Embedding-Based Reasoning with Deep Reinforcement Learning: DeepPath (Xiong et al., 2017)

24

slide-25
SLIDE 25

RL for KB Reasoning: DeepPath (Xiong et al., 2017)

Ø Learning the paths with RL, instead of using random walks with restart Ø Model the path finding as a MDP Ø Train a RL agent to find paths Ø Represent the KG with pretrained KG embeddings Ø Use the learned paths as logical formulas

25

slide-26
SLIDE 26

Supervised v.s. Reinforcement

Supervised Learning

  • Training basedon

supervisor/label/annotation

  • Feedback isinstantaneous
  • Not much temporal aspects

Reinforcement Learning

  • Training only basedon

reward signal

  • Feedback isdelayed
  • Timematters
  • Agent actionsaffect

subsequent exploration

2 6

slide-27
SLIDE 27

Reinforcement Learning

2 7

  • RL is a general purpose framework for

decision making

  • ◦ RL is for an agent with the capacity toact
  • ◦ Each action influences the agent’s futurestate
  • ◦ Success is measured by a scalar rewardsignal
  • ◦ Goal: select actions to maximize futurereward
slide-28
SLIDE 28

Reinforcement Learning

28

Environment

!" #$%& '

$%&

#$ '

$

Agent Agent Environment Multi-layer neural nets ѱ(st ) KG modeled as a MDP

slide-29
SLIDE 29

DeepPath: RL for KG Reasoning

29

slide-30
SLIDE 30

Components of MDP

  • Markov decision process < ", $, %, & >
  • ": continuous states represented with embeddings
  • $: action space (relations or edges)
  • % ">?@ = BC "> = B, $> = D : transition probability
  • & B, D : reward received for each taken step
  • With pretrained KG embeddings
  • B> = I> ⊕ (I>KLMN> − I>)
  • $ = P

@, PQ, … , P S , all relations in the KG

30

slide-31
SLIDE 31

Reward Functions

  • Global Accuracy
  • Path Efficiency
  • Path Diversity

31

slide-32
SLIDE 32

Training with Policy Gradient

  • Monte-Carlo Policy Gradient (REINFORCE,

William, 1992)

32

slide-33
SLIDE 33

Challenge

Ø Typical RL problems

q Atari games (Mnih et al., 2015): 4~18 valid actions q AlphaGo (Silver et al. 2016): ~250 valid actions q Knowledge Graph reasoning: >= 400 actions Is Issue: ue: q large action (search) space -> poor convergence properties

33

slide-34
SLIDE 34

Supervised (Imitation) Policy Learning

§ Use randomized BFS to retrieve a few paths § Do imitation learning using the retrieved paths § All the paths are assigned with +1 reward

34

slide-35
SLIDE 35

Datasets and Preprocessing

Dataset # of Entities # of Relations # of Triples # of Tasks FB15k-237 14,505 237 310,116 20 NELL-995 75,492 200 154,213 12

35

Ø Dataset processing q Remove useless relations: haswikipediaurl, generalizations, etc q Add inverse relation links to the knowledge graph q Remove the triples with task relations

FB15k-237: Sampled from FB15k (Bordes et al., 2013), redundant relations removes NELL-995: Sampled from the 995th iteration of NELL system (Carlson et al., 2010b)

slide-36
SLIDE 36

Effect of Supervised Policy Learning

36

  • x-axis: number of training epochs
  • y-axis: success ratio (probability of reaching the target) on test set
  • > Re-train the agent using reward functions
slide-37
SLIDE 37

Inference Using Learned Paths

§ Path as logical formula

§ Fi FilmCo mCountr try: actionFilm-1 -> personNationality § Pe PersonNationality: : placeOfBirth -> locationContains-1 § etc …

§ Bi-directional path-constrained search

§ Check whether the formulas hold for entity pairs

37

… … Uni-directional search bi-directional search

slide-38
SLIDE 38

Link Prediction Result

38

Tasks PRA DeepPath TransE TransR

worksFor

0.681 0.711 0.677 0.692

atheletPlaysForTea m

0.987 0.955 0.896 0.784

athletePlaysInLeag ue

0.841 0.960 0.773 0.912

athleteHomeStadiu m

0.859 0.890 0.718 0.722

teamPlaysSports

0.791 0.738 0.761 0.814

  • rgHirePerson

0.599 0.742 0.719 0.737

personLeadsOrg

0.700 0.795 0.751 0.772

… Overall

0.675 0.796 0.737 0.789

Mean average precision on NELL-995

slide-39
SLIDE 39

Qualitative Analysis

Path length distributions

39

slide-40
SLIDE 40

Qualitative Analysis

40

Example Paths

personNationality:

placeOfBirth -> locationContains-1 peoplePlaceLived -> locationContains-1 placeOfBirth -> locationContains

peopleMariage -> locationOfCeremony -> locationContains-1

tvProgramLanguage:

tvCountryOfOrigin -> countryOfficialLanguage tvCountryOfOrigin -> filmReleaseRegion-1 -> filmLanguage tvCastActor -> personLanguage

athletePlaysForTeam:

athleteHomeStadium -> teamHomeStadium-1 atheleteLedSportsTeam athletePlaysSports -> teamPlaysSports-1

slide-41
SLIDE 41

Bridging Path-Finding and Reasoning w. Variational Inference DIVA (Chen et al., NAACL 2018)

41

slide-42
SLIDE 42

DIVA: Variational KB Reasoning (NAACL 2018)

  • Inferring latent paths connecting entity nodes.

42

United States English Condition ("#, "%) countrySpeakLanguage Observed Variable '

̅ ) = +',-+./ log )('|"#, "%)

)('|"#, "%)

slide-43
SLIDE 43

DIVA: Variational KB Reasoning (NAACL 2018)

  • Inferring latent paths connecting entity nodes by

parameterizing likelihood (path reasoning) and prior (path finding) with neural network modules.

43

tvProgram Language !"#$%&$' ()%*)"+$

, = )%./)01 ,(%|$4, $6) = )%./)01 log ;

< =

, % > ,(>|$4, $6)

? Unobserved Variable L % Band of Brothers English Condition $6 $4 ,(>|$4, $6) , % > prior likelihood

slide-44
SLIDE 44
  • Marginal likelihood log ∫

% & ' ( &((|+,, +.)is

intractable

  • We resort to Variational Bayes by introduce a

posterior distribution 0 ( +,, +., '

44

DP Kingma et al. 2013 log p r +,, +. 34 ( +,, +., ' log & ' ( 5((0 ( +,, +., ' ||&((|+,, +.)) ≥ − 8(9:

DIVA: Variational KB Reasoning (NAACL 2018)

slide-45
SLIDE 45

Parameterization – Path-finder

  • Approximate posterior !" # $%, $', ( and prior

)* # $%, $' : parameterize with RNN

45

$% $+ $' ,+ $+ $- $. $/ 01 0. 0/ KG $- $. $/

Transition Probability: )(,+3-, $+3-|,-:+, $-:+)

1

0. 0/

slide-46
SLIDE 46

Parameterization – Path Reasoner

  • Likelihood !" ($|&) : parameterize with CNN

46

() (* (+ (, (- () () (* (*

$(./0123+ $(./0123, $(./0123-

455 62708/9

slide-47
SLIDE 47
  • Training

47 relation !"

!#

$% ('|)) +,()) KG connected Path

!# !"

$- ())

KL- divergence

KG connected Path

!# !"

./0 ) !#, !", ' log $% ' )

relation

5)(+, ) !#, !", ' ||$-()|!#, !")) $678!'96': +,, likelihood: $% ' ) , prior: $-()|!#, !")

DIVA: Variational KB Reasoning (NAACL 2018)

slide-48
SLIDE 48
  • Testing

48 !"

!#

$% ('|)) $+ ()) KG connected Path

!# !"

relation

$,-.!'/,': 12, likelihood: $% ' ) , prior: $+()|!#, !")

DIVA: Variational KB Reasoning (NAACL 2018)

slide-49
SLIDE 49

Conclusions

  • Embedding-based methods are very scalable

and robust.

  • Path-based methods are more interpretable.
  • There are some recent efforts in unifying

embedding and path-based approaches.

49