Learning to Exploit Long-term Relational Dependencies in Knowledge - - PowerPoint PPT Presentation

learning to exploit long term relational dependencies in
SMART_READER_LITE
LIVE PREVIEW

Learning to Exploit Long-term Relational Dependencies in Knowledge - - PowerPoint PPT Presentation

Learning to Exploit Long-term Relational Dependencies in Knowledge Graphs Lingbing Guo, Zequn Sun, Wei Hu* Nanjing University, China * Corresponding author: whu@nju.edu.cn ICML19, June 915, Long Beach, CA, USA Knowledge graphs Knowledge


slide-1
SLIDE 1

Lingbing Guo, Zequn Sun, Wei Hu*

Nanjing University, China * Corresponding author: whu@nju.edu.cn

Learning to Exploit Long-term Relational Dependencies in Knowledge Graphs

ICML’19, June 9–15, Long Beach, CA, USA

slide-2
SLIDE 2

Knowledge graphs

n

Knowledge graphs (KGs) store a wealth of structured facts about the real world

¡

A fact !, #, $ : subject entity, relation, object entity

n

KGs are far from complete and two important tasks are proposed

2 Introduction ➤ Our method ➤ Experiments and results ➤ Conclusion

slide-3
SLIDE 3

Knowledge graphs

n

Knowledge graphs (KGs) store a wealth of structured facts about the real world

¡

A fact !, #, $ : subject entity, relation, object entity

n

KGs are far from complete and two important tasks are proposed

1.

Entity alignment: find entities in different KGs denoting the same real-world object

2.

KG completion: complete missing facts in a single KG

n

E.g., predict ? in (Tim Berners-Lee, employer, ?) or (?, employer, W3C)

3 Introduction ➤ Our method ➤ Experiments and results ➤ Conclusion

slide-4
SLIDE 4

Challenges

n

For KG embedding, existing methods largely focus on learning from relational triples of entities

n

Triple-level learning has two major limitations

¡

Low expressiveness

n

Learn entity embeddings from a fairly local view (i.e., 1-hop neighbors)

¡

Inefficient information propagation

n

Only use triples to deliver semantic information within/across KGs

4 Introduction ➤ Our method ➤ Experiments and results ➤ Conclusion

slide-5
SLIDE 5

Learning to exploit long-term relational dependencies

n

A relational path is an entity-relation chain, where entities and relations appear alternately

n

RNNs perform well on sequential data

¡

Limitations to leverage RNNs to model relational paths

1.

A relational path have two different types: “entity” and “relation”

¡

Always appear in an alternating order

2.

A relational path is constituted by triples, but these basic structure units are overlooked by RNNs

5 Introduction ➤ Our method ➤ Experiments and results ➤ Conclusion

United Kingdom → country – → Tim Berners-Lee → employer → W3C

slide-6
SLIDE 6

Recurrent skipping networks

n

A conditional skipping mechanism allows RSNs to shortcut the current input entity to let it directly participate in predicting its object entity

6 Introduction ➤ Our method ➤ Experiments and results ➤ Conclusion

slide-7
SLIDE 7

Tri-gram residual learning

n

Residual learning

¡

Let !(#) be an original mapping, and %(#) be the expected mapping

¡

Compared to directly optimizing !(#) to fit %(#), it is easier to optimize !(#) to fit residual part %(#)

n

An extreme case, %(#) = #

7 Introduction ➤ Our method ➤ Experiments and results ➤ Conclusion

slide-8
SLIDE 8

Tri-gram residual learning

n

Residual learning

¡

Let !(#) be an original mapping, and %(#) be the expected mapping

¡

Compared to directly optimizing !(#) to fit %(#), it is easier to optimize !(#) to fit residual part %(#)

n

An extreme case, %(#) = #

n

Tri-gram residual learning

¡

United Kingdom → country – → Tim Berners-Lee → employer → W3C

¡

Compared to directly learning to predict W3C by employer and its mixed context, it is easier to learn the residual part between W3C and Tim Berners-Lee

n

Because they forms a triple, and we should not overlook the triple structure in the paths

8 Introduction ➤ Our method ➤ Experiments and results ➤ Conclusion

(United Kingdom, country –, Tim Berners-Lee, employer, W3C)

Models Optimize !([)], employer) as RNNs ! ) , employer ≔ W3C RRNs ! ) , employer ≔ W3C − ) RSNs ! ) , employer ≔ W3C − Tim Berners−Lee

) denotes context (United Kingdom, country –, Tim Berners-Lee)

slide-9
SLIDE 9

Architecture

n

An end-to-end framework

1.

Biased random walk sampling

n

Deep paths carry more relational dependencies than triples

n

Cross-KG paths deliver alignment information between KGs

2.

Recurrent skipping network

3.

Type-based noise contrastive estimation

n

Evaluate loss in an optimized way

9 Introduction ➤ Our method ➤ Experiments and results ➤ Conclusion

United Kingdom → country– → Tim Berners-Lee → employer → W3C …… NCE loss NCE loss negative entities negative relations Type-based Noise Contrastive Estimation (NCE) English in KG1 English in KG2 embedding embeddings

0.78 0.12 0.03 0.01 0.04 0.25 0.05 0.46

cosine similarity Embedding-based Entity Alignment

Tim Berners-Lee English United Kingdom language

KG1 KG2

W3C Tim Berners-Lee language English seed alignment

Biased Random Walk Sampling

English Tim Berners-Lee

𝑓𝑗+1

Tim Berners-Lee

𝑓𝑗

English United Kingdom W3C language, 0.1 language– language– language, 0.4

𝑓𝑗−1

language language–

Recurrent Skipping Network RNN unit combine combine

slide-10
SLIDE 10

Experiments and results

n

Entity alignment results

¡

Datasets: normal & dense

¡

Performed best on all datasets

n

Especially on the normal datasets

10

Hits@1 DBP-WD DBP-YG EN-FR EN-DE MTransE 22.3 24.6 25.1 31.2 IPTransE 23.1 22.7 25.5 31.3 JAPE 21.9 23.3 25.6 32.0 BootEA 32.3 31.3 31.3 44.2 GCN-Align 17.7 19.3 15.5 25.3 TransR 5.2 2.9 3.6 5.2 TransD 27.7 17.3 21.1 24.4 ConvE 5.7 11.3 9.4 0.8 RotatE 17.2 15.9 14.5 31.9 RSNs (w/o biases) 37.2 36.5 32.4 45.7 RSNs 38.8 40.0 34.7 48.7

Introduction ➤ Our method ➤ Experiments and results ➤ Conclusion

slide-11
SLIDE 11

Experiments and results

n

Entity alignment results

¡

Datasets: normal & dense

¡

Performed best on all datasets

n

Especially on the normal datasets

n

KG completion results

¡

Datasets: FB15K, WN18

¡

Obtained comparable performance

n

Better than all translational models

11

Hits@1 DBP-WD DBP-YG EN-FR EN-DE MTransE 22.3 24.6 25.1 31.2 IPTransE 23.1 22.7 25.5 31.3 JAPE 21.9 23.3 25.6 32.0 BootEA 32.3 31.3 31.3 44.2 GCN-Align 17.7 19.3 15.5 25.3 TransR 5.2 2.9 3.6 5.2 TransD 27.7 17.3 21.1 24.4 ConvE 5.7 11.3 9.4 0.8 RotatE 17.2 15.9 14.5 31.9 RSNs (w/o biases) 37.2 36.5 32.4 45.7 RSNs 38.8 40.0 34.7 48.7 FB15K Hits@1 Hits@10 MRR TransE 30.5 73.7 0.46 TransR 37.7 76.7 0.52 TransD 31.5 69.1 0.44 ComplEx 59.9 84.0 0.69 ConvE 67.0 87.3 0.75 RotatE 74.6 88.4 0.80 RSNs (w/o cross-KG biase) 72.2 87.3 0.78

Introduction ➤ Our method ➤ Experiments and results ➤ Conclusion

slide-12
SLIDE 12

Further analysis

n

RSNs vs. RNNs, RRNs [recurrent residual networks]

¡

Achieved better results with only 1/30 epochs

12 Introduction ➤ Our method ➤ Experiments and results ➤ Conclusion

0.1 0.2 0.3 0.4

Hits@1

(a) DBP-WD (normal)

RSNs RRNs (SC-LSTM) RNNs

0.4 0.5 0.6 0.7 0.8 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95

Hits@1 Epochs

(b) DBP-WD (dense)

slide-13
SLIDE 13

Further analysis

n

RSNs vs. RNNs, RRNs [recurrent residual networks]

¡

Achieved better results with only 1/30 epochs

n

Random walk length

¡

On all the datasets, increased steadily from length 5 to 15

13 Introduction ➤ Our method ➤ Experiments and results ➤ Conclusion

65 70 75 80 85 5 7 9 11 13 15 17 19 21 23 25

Hits@1 Random walk length DBP-WD DBP-YG EN-FR EN-DE

30 35 40 45 50 5 7 9 11 13 15 17 19 21 23 25

Hits@1 Random walk length DBP-WD DBP-YG EN-FR EN-DE

normal dense

0.1 0.2 0.3 0.4

Hits@1

(a) DBP-WD (normal)

RSNs RRNs (SC-LSTM) RNNs

0.4 0.5 0.6 0.7 0.8 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95

Hits@1 Epochs

(b) DBP-WD (dense)

slide-14
SLIDE 14

Conclusion

n

We studied path-level KG embedding learning

1.

RSNs: sequence models to learn relational paths

2.

End-to-end framework: biased random walk sampling + RSNs

3.

Superior in entity alignment and competitive in KG completion

n

Future work

¡

Unified sequence model: relational paths & textual information

14 Introduction ➤ Our method ➤ Experiments and results ➤ Conclusion

slide-15
SLIDE 15

Datasets & source code: https://github.com/nju-websoft/RSN Acknowledgements:

l

National Key R&D Program of China (No. 2018YFB1004300)

l

National Natural Science Foundation of China (No. 61872172)

l

Key R&D Program of Jiangsu Science and Technology Department (No. BE2018131)

Poster: Tonight, Pacific Ballroom #42

ICML’19, June 9–15, Long Beach, CA, USA