Knowledge Graph Embedding and Its Applications Xiaolong Jin CAS Key - - PDF document

knowledge graph embedding and its applications
SMART_READER_LITE
LIVE PREVIEW

Knowledge Graph Embedding and Its Applications Xiaolong Jin CAS Key - - PDF document

Knowledge Graph Embedding and Its Applications Xiaolong Jin CAS Key Lab of Network Data Science and Technology, Institute of Computing Technology (ICT), Chinese Academy of Sciences (CAS) 2019 11 30@Fudan Agenda Background Knowledge


slide-1
SLIDE 1

Knowledge Graph Embedding and Its Applications

Xiaolong Jin

CAS Key Lab of Network Data Science and Technology, Institute of Computing Technology (ICT), Chinese Academy of Sciences (CAS) 2019‐11‐30@Fudan

Agenda

 Background  Knowledge Graph Embedding (KGE)  Applications of KGE  Conclusions

2

slide-2
SLIDE 2

Background

 A Knowledge Graph (KG) is a system that

understands facts about people, places and things and how these entities are all connected

 Examples

 Dbpedia  YAGO  NELL  Freebase  Wolfram Alpha  Probase  Google KG  ……

3

Background

 Typical applications of KGs

 Vertical search  Intelligent QA  Disease diagnosis  Financial anti‐fraud  Abnormal data analysis  Machine translation  ……

4

slide-3
SLIDE 3

Vertical Search

5

Intelligent QA

 IBM’s Watson  Google’s Google Now  Apple’s Siri  Amazon’s Alexa  Microsoft’s Xiaobing & Cortana  Baidu’s Dumi (度秘)  Sogou’s Wangzai (旺仔)  …

6

slide-4
SLIDE 4

Disease Diagnosis

 Watson Care Manager  Knowledge service

platform for Traditional Chinese Medicine (TCM)

 …

7

基于知识图谱的癌症研究 @安德森癌症中心&IBM Watson 中医药知识服务平台 @中医科学院中医药信息研究所

Typical Representation of KGs

 Symbolic triples: (head entity, relation, tail entity)

 e.g.,

(Eiffel Tower, is_located_in, Paris)

(Eiffel Tower, is_a, place)

(Bob, is_a_friend_of, Alice)

8

slide-5
SLIDE 5

Inference over KGs

 Logic based models

 Pros: Easily interpretable  Cons: Highly complex

 Path ranking algorithms

 Pros: Easily interpretable  Cons:

Cannot handle rare relations

Cannot handle KGs with low connectivity

Extracting paths is time‐consuming

 Embedding‐based methods

 Pros:

Highly efficient

Can capture semantic information

 Cons: Less interpretable

9

Agenda

 Background  Knowledge Graph Embedding (KGE)  Applications of KGE  Conclusions

10

slide-6
SLIDE 6

Knowledge Graph Embedding (KGE)

 Map the entities, relations, and even paths of a

KG into a low‐dimensional vector space

 Encode semantic information  Computationally efficient  Basic idea

 Treat relations as the translation operations between

vectors corresponding to entities

 The score function of  Loss function

TransE (Translational Embeddings)

12

Positive triple set Negative triple set Optimal Margin

h + r = t China + Capital = Beijing France + Capital = Paris

slide-7
SLIDE 7

 TransE cannot well handle

1‐N, N‐1, or N‐M relations

 TransH  TransR  …

13

Trans Series of KGE

TransH TransR

Agenda

 Background  Knowledge Graph Embedding (KGE)  Applications of KGE  Conclusions

14

slide-8
SLIDE 8

The Applications of KGE

 Basic applications

 Link prediction  Entity alignment  KG integration  …

 Advanced applications

 Vertical search  Intelligent QA  Disease diagnosis  …

KG1 (group 1)

aligned entity pairs

KG2 (group 2)

? ? Shared embedding based neural networks for knowledge graph completion

  • S. Guan, X. Jin, Y. Wang, et al.

The 27th ACM International Conference on Information and Knowledge Management (CIKM’18)

Application 1:Link Prediction

16

slide-9
SLIDE 9

Motivation

 Existing methods for link prediction

 Handle three types of tasks:  Do not distinguish them in training

 These prediction tasks have quite different performance

 Link prediction upon reasoning

 It is a process that gradually approaches to the target

 FCN of decreasing hidden nodes can imitate such a process

17

The Proposed Method

 Shared Embedding based Neural Network (SENN)

 Explicitly distinguish the three prediction tasks  Integrate them into a FCN based framework

 Extend SENN to SENN+

 Use

to improve and

18

slide-10
SLIDE 10

The SENN Method

 The framework

 2 shared embedding matrices  3 substructures: head_pred, rel._pred and tail_pred

19

The Three Substructures

 Head_pred

 The score function

where is the ReLU function

20

slide-11
SLIDE 11

The Three Substructures

 Head_pred

 The prediction label vector

where is the sigmoid or softmax function

 Each element of indicates the probability of the

corresponding entity h to form a valid triple (h, r, t)

21

The Three Substructures

 rel._pred and tail_pred do similarly

 The score functions  The prediction label vectors

22

slide-12
SLIDE 12

Model Training

 The general loss function

 Idea: cross entropy of prediction and target label vectors  These prediction tasks have their target label vectors ,

and for where is the set of valid head entities in the training set, given and

 Use label smoothing to regularize target label vectors

where is the label smoothing parameter

23

Model Training

 The general loss function

 Binary cross‐entropy losses for the 3 prediction tasks  The general loss for the given triple

24

slide-13
SLIDE 13

Model Training

 The adaptively weighted loss mechanism

 The prediction on 1‐side or M‐side

 Punish the model more severely if deterministic ones are wrong

 Relation prediction and entity prediction

 Punish wrong predictions on head/tail entities more severely

 The final loss function for the triple

25

The SENN+ Method

 Employ

to improve and in test

 The relation‐aided test mechanism

 Given

, assume that is a valid head entity

 If we do

, is most probably have a prediction label higher than other relations and be ranked higher

26

slide-14
SLIDE 14

The SENN+ Method

 The adaptively weighted loss mechanism

 Two additional relation aided vectors  The final prediction label vectors for entity prediction

27

Experiments

 Entity prediction

28

slide-15
SLIDE 15

Experiments

 Entity prediction in detail

 The adaptively weighted loss mechanism

Distinguish and well learn the predictions of different mapping properties

29

Experiments

 Relation prediction

 SENN and SENN+ capture the

following information to

  • btain better performance

 Implicit information interaction

among different predictions

 Prediction‐specific information

30

slide-16
SLIDE 16

NaLP: Link Prediction on N‐ary Relational Data

  • S. Guan, X. Jin, Y. Wang, X. Cheng.

The 2019 International World Wide Web Conference (WWW 2019)

Application 2:Link Prediction on N‐Ary Facts

31

Motivation

 N‐ary facts are pervasive in practice  Existing link prediction methods usually convert n‐ary facts into

a few triples (i.e., binary sub‐facts), which has some drawbacks

 Needs to consider many triples and is thus more complicated  The loss of structural information in some conversions that leads to

inaccurate link prediction

 The added virtual entities and triples bring in more parameters to be

learned

32

slide-17
SLIDE 17

33

Related works

 A few link prediction methods focus on n‐ary

facts directly

 m‐TransH (IJCAI‐2016)

 A relation is defined by the mapping from a sequence of

roles, corresponding to this type of relation, to their values. E.g.,

 Receive_Award: [person, award, point in time]  [Marie Curie, Nobel Prize in Chemistry, 1911]  “Marie Curie received Nobel Prize in Chemistry in 1911.”  Each specific mapping is an instance of the relation  Generalize TransH to n‐ary relational data

34

Related works

 Also a few link prediction methods focus on n‐ary

facts directly

 RAE (Relatedness Affiliated Embedding, WWW‐2018)

 Improve m‐TransH by further considering values’ relatedness  Ignore the roles in the above process  Under different sequences of roles, the relatedness of two values

is greatly different Marie Curie and Henri Becquerel (person, award, point in time, winner) (person, spouse, start time, end time, place of marriage)

The proposed NaLP method explicitly models the relatedness

  • f the role‐value pairs.
slide-18
SLIDE 18

The NaLP method

 The presentation of each n‐ary fact

 A set of role‐value pairs  Formally, given an n‐ary fact with roles, each role

having values, the representation is as follows:

 For example,

“Marie Curie received Nobel Prize in Chemistry in 1911.”

is represented as:

{person: Marie Curie, award: Nobel Prize in Chemistry, point in time: 1911}

35

The NaLP method

 The framework

 A role and its value are tightly linked to each other, thus

should be bound together

 For a set of role‐value pairs, it decides if they form a

valid n‐ary fact, i.e., if they are closely related

36

Role‐value pair embedding Relatedness evaluation

slide-19
SLIDE 19

Role‐value pair embedding

37

Form the embedding matrix Capture the features of the role‐value pairs

Relatedness evaluation

38

 The principle

 A set of role‐value pairs form a valid fact

→ Every two role‐value pairs are greatly related → The values of their relatedness feature vector are large → The minimum over each feature dimension among all the pairs is not allowed to be too small → Apply element‐wise minimizing over the pair‐wise relatedness to approximately evaluate the overall relatedness

slide-20
SLIDE 20

Relatedness evaluation

39

Compute the relatedness between role‐value pairs Estimate the overall relatedness

  • f all the role‐value pairs

Obtain the evaluation score

Look into NaLP and the loss function

 Look into NaLP

 Permutation‐invariant to the input order of role‐value

pairs

 Able to cope with facts of different arities

 The loss function:

40

slide-21
SLIDE 21

Experiments

 Datasets

 Public n‐ary dataset JF17K, derived from Freebase

 All the facts in it are in good quality  The form of a relation type is fixed. E.g.,  “Marie Curie received Nobel Prize in Chemistry in 1911.” 

[person, award, point in time]

 Our new dataset WikiPeople, derived from Wikidata

 More practical and flexible, a relation type may have multiple

  • variants. E.g., Receive_Award

 “Marie Curie received Nobel Prize in Chemistry in 1911.”  “Marie Curie received Willard Gibbs Award.”  “Marie Curie received Matteucci Medal in 1904 with Pierre Curie.”  “Marie Curie received Davy Medal with Pierre Curie.”

41

Experiments

 Datasets  Metrics

 MRR (Mean Reciprocal Rank)  Hits@N (the proportion of the test values/roles ranked in

the top‐N ranking list)

 Baseline: the state‐of‐the‐art method RAE

 Works for link prediction on n‐ary relational data are scarce  m‐TransH is the simplified version of RAE

42

slide-22
SLIDE 22

Experiments

 Value prediction

 On the dataset in good quality, i.e., JF17K

MRR↑0.056, Hits@1↑7.1%, Hits@3↑ 5.7%

 On the relatively more practical dataset, i.e., WikiPeople

MRR↑0.166, Hits@1↑17.0%, Hits@3↑18.2%

 NaLP is able to better cope with diverse data than RAE

 In RAE, a new relation is defined when data incompleteness/

insert/update appears → may lead to data sparsity → much worse performance of RAE on WikiPeople

43

Experiments

 Value prediction in detail

 NaLP performs better on both binary and n‐ary categories  On JF17K

The gap is pronounced, especially on Hits@1, Hits@3 and MRR

 On WikiPeople

RAE is even more largely left behind

44

slide-23
SLIDE 23

Experiments

 Role prediction in detail

 No other baseline

RAE is deliberately designed only for value prediction

 Achieve excellent results

The reasonable modeling of role‐value pairs not only enhances value prediction, but also benefits role prediction

45

Experiments

 Overall relatedness analysis

 The overall relatedness is the crucial intermediate result  A valid fact has larger values in a majority of dimensions

compared to its negative samples

 Distinguishability metric

46

slide-24
SLIDE 24

Experiments

 Overall relatedness analysis

 Case1: Predict Michael Douglas of Fact1

 Fact1: {son:Michael Douglas,

father:Kirk Douglas}

 Case2: Predict Nobel Prize in Physics of Fact2

 Fact2: {person:Marie Curie,

award:Nobel Prize in Physics, point in time:1903, together with:Henri Becquerel, together with:Pierre Curie}

47

Experiments

 Overall relatedness analysis of Case1 (left) and Case2 (right)

 Most distinguishability results lie in the area above 0  The overall relatedness vector captures many discriminant features to

further estimate the validity of the input fact

48

slide-25
SLIDE 25

Agenda

 Background  Knowledge Graph Embedding (KGE)  Applications of KGE  Conclusions

49

Conclusions

 Knowledge graph embedding projects traditional

symbol representation to vector spaces

 It further supports basic and advanced applications  The application‐oriented knowledge graph

embedding is a research hotspot

 Open issues

 Incremental inference  Link inference between n‐ary facts

50

slide-26
SLIDE 26

51

Thank you!