An Attention-based Model for Joint Extraction of Entities and - - PowerPoint PPT Presentation

an attention based model for joint
SMART_READER_LITE
LIVE PREVIEW

An Attention-based Model for Joint Extraction of Entities and - - PowerPoint PPT Presentation

1 An Attention-based Model for Joint Extraction of Entities and Relations with Implicit Entity Features ADVISOR: JIA-LING, KOH SOURCE: WWW 2019 SPEAKER: SHAO-WEI, HUANG DATE: 2019/09/20 2 OUTLINE Introduction Method Experiment


slide-1
SLIDE 1

An Attention-based Model for Joint Extraction of Entities and Relations with Implicit Entity Features

ADVISOR: JIA-LING, KOH SOURCE: WWW 2019 SPEAKER: SHAO-WEI, HUANG DATE: 2019/09/20

1

slide-2
SLIDE 2

⚫ Introduction ⚫ Method ⚫ Experiment ⚫ Conclusion

2

OUTLINE

slide-3
SLIDE 3

INTRODUCTION

➢ Extract entities and their semantic relations

from an unstructured input sentence.

3

(Ex): ”Donald Trump is the 45th and current president

  • f the United States"

(Donald Trump, President-Of, United States)

Eentiy 1 Eentiy 2 Semantic relation

slide-4
SLIDE 4

4

INTRODUCTION

Two categories method of RE

➢ Pipelined models: ⚫ Identify the entity pair first. And then predict the

relations between them. (Donald Trump, United States) (President-Of)

Result have an impact on it

Two categories method of RE

➢ Joint models(this paper use):

Identify the entity pair and relations at the same time.

slide-5
SLIDE 5

5

INTRODUCTION

Problem definition

Other B:Begin

(Have Begin, Inside, End, Single four type)

PR:President-Of

(Predfined 24 relation type)

1:First entity E:End PR:President-Of 2:Second entity

slide-6
SLIDE 6

OUTLINE

Introduction Method Experiment Conclusion

6

slide-7
SLIDE 7

7

FRAMEWORK

7

slide-8
SLIDE 8

METHOD

8

Features(Word embedding & character embedding )

⚫ Pre-train the embedding for words.

➢Word embedding:

⚫ Each word is broken up into

individual characters. ➢Character embedding:

⚫ Each characters are mapping to

their embedding (𝑑1, 𝑑2, …, 𝑑𝑀).

⚫ Adopt Bi-LSTM to generates the

character embedding for the word.

slide-9
SLIDE 9

9 Donald

d

  • n

a l d Bi- LSTM Bi- LSTM Bi- LSTM Bi- LSTM Bi- LSTM Bi- LSTM ℎ1 ℎ2 ℎ3 ℎ4 ℎ5 ℎ6 Character embedding for the word

METHOD Features(Word embedding & character embedding )

➢Character embedding: 9

slide-10
SLIDE 10

METHOD

10

Features(Implicit features)

⚫ Pre-train an model on an existing

NER dataset. ➢Implicit entity feature:

⚫ Feed the input sentence(Danald

Trump is the…...) into this model.

⚫ The hidden vectors are entity features.

slide-11
SLIDE 11

11

METHOD Features(Implicit features)

➢Implicit entity feature: 11

slide-12
SLIDE 12

12

METHOD

➢LSTM:

https://www.itread01.com/content/1545027542.html 𝑕𝑢

Encoder layer

12

slide-13
SLIDE 13

13

METHOD Encoding layer

➢Encoding layer:

This layer receives three vectors(concate) as input.

⚫ Use Bi-LSTM to computes

the t step hidden state ℎ𝑢. 13

slide-14
SLIDE 14

14

METHOD Attention layer

The input of attention layer

slide-15
SLIDE 15

Attention vector (relevant representation)

15

METHOD Attention layer

➢ Tag aware attention:

⚫ Allow the model to select the

relevant parts of the sentence for the prediction of the tag.

slide-16
SLIDE 16

16

METHOD Attention layer

➢ Fusion gate:

⚫ When predicting the tag of a word,

the gate allows to trade off the information used from ℎ𝑏𝑢 and ℎ𝑢.

Output of attention layer Weight

16

slide-17
SLIDE 17

17

METHOD Decoding layer

The input of decoding layer

slide-18
SLIDE 18

18

METHOD Decoding layer

➢Decoding layer:

⚫ Adopt LSTM to generate vectors

representing the output states.

t-1-th hidden state

  • f LSTM

t-1-th tag embedding t-th

  • utput of

Attention layer

slide-19
SLIDE 19

19

METHOD Decoding layer

➢Decoding layer(continue):

⚫ Adopt a softmax classifier to compute

entity tag probabilities.

⚫ Objective function:

slide-20
SLIDE 20

OUTLINE

Introduction Method Experiment Conclusion

20

slide-21
SLIDE 21

21

EXPERIMENT

Dataset

➢ NYT:

  • 353000 triplets in the training data and 3880 triplets

in the testing data.

  • Relation type in the dataset is 24.
slide-22
SLIDE 22

22

EXPERIMENT

➢ Dimension sizes:

slide-23
SLIDE 23

23

EXPERIMENT

➢ Comparison with baselines:

Pipelined Joint End-to-end

  • A triplet is regarded as correct

when all correct.

slide-24
SLIDE 24

24

EXPERIMENT

➢ Ablation results:

  • A triplet is regarded as correct

when all correct.

slide-25
SLIDE 25

25

EXPERIMENT

➢ Ablation results on triplet elements:

slide-26
SLIDE 26

26

EXPERIMENT

➢ Visualization of attention weights:

slide-27
SLIDE 27

OUTLINE

Introduction Method Experiment Conclusion

27

slide-28
SLIDE 28

28

CONCLUSION

➢ Propose an attention-based model enhanced with

implicit entity features for the joint extraction of entities and relations.

➢ This model can take advantage of the entity features

and dose not need to manually design them.

➢ Design a Tag-Aware attention mechanism which enables

  • ur model to select the informative words to the

prediction.