A Supervised Sequence 2 Sequence Problem Janos Borst July 26, 2019 - - PowerPoint PPT Presentation

a supervised sequence 2 sequence problem
SMART_READER_LITE
LIVE PREVIEW

A Supervised Sequence 2 Sequence Problem Janos Borst July 26, 2019 - - PowerPoint PPT Presentation

A Supervised Sequence 2 Sequence Problem Janos Borst July 26, 2019 University of Leipzig - NLP Group Sequence to Sequence l 2 PUNCT NOUN PRON PREP NOUN ART l 6 l 5 l 4 l 3 words l 1 many of sentence A 1 ( ) . ( ) ( )


slide-1
SLIDE 1

A Supervised Sequence 2 Sequence Problem

Janos Borst July 26, 2019

University of Leipzig - NLP Group

slide-2
SLIDE 2

Sequence to Sequence

( A sentence

  • f

many words . ) ↓ ( l1 l2 l3 l4 l5 l6 ) ( ART NOUN PREP PRON NOUN PUNCT )

1

slide-3
SLIDE 3

Named Entity Tagging

Named Entities ”An instance of a unique object with specifjc properties. (Person, Location, Product...)”

2

slide-4
SLIDE 4

Example

person ”Introduction to Neural Networks” is a workshop by Janos Borst.

3

slide-5
SLIDE 5

Example

person ”Introduction to Neural Networks” is a workshop by Janos Borst.

3

slide-6
SLIDE 6

Tags

  • LOC: Location
  • PER: Person
  • ORG: Organisation
  • MISC: Mixed (Events, works,...)

4

slide-7
SLIDE 7

Tagging Schemes

This is not New York . O O O B- LOC E- LOC O How do we know this is ”New York” or ”New” and ”York”? Introducing tag prefjxes for spans:

  • B-: Beginning of an span
  • I-: Inside a span
  • E-: End of a span
  • S-: Single token span

This is the BIOES scheme. (more info)

5

slide-8
SLIDE 8

Tagging Schemes

This is not New York . O O O B-LOC E-LOC O How do we know this is ”New York” or ”New” and ”York”? Introducing tag prefjxes for spans:

  • B-: Beginning of an span
  • I-: Inside a span
  • E-: End of a span
  • S-: Single token span

This is the BIOES scheme. (more info)

5

slide-9
SLIDE 9

The Requirements

slide-10
SLIDE 10

Data

Supervised Training data: CoNLL-2003

  • Sequence to sequence tasks
  • contains:
  • Named Entities tags
  • Part-of-Speech tags
  • Phrasing tags

6

slide-11
SLIDE 11

Data

sentence_id token_id token pos chunks ner 1 EU NNP I-NP S-ORG 1 1 rejects VBZ I-VP O 1 2 German JJ I-NP S-LOC 1 3 call NN I-NP O 1 4 to TO I-VP O 1 5 boycott VB I-VP O 1 6 British JJ I-NP S-MISC 1 7 lamb NN I-NP O 1 8 . . O O 2 Peter NNP I-NP B-PER 2 1 Blackburn NNP I-NP E-PER 3 BRUSSELS NNP I-NP S-LOC 3 1 1996-08-22 CD I-NP O 4 The DT I-NP O 4 1 European NNP I-NP B-ORG 4 2 Commission NNP I-NP E-ORG 4 3 said VBD I-VP O 4 4

  • n

IN I-PP O 4 5 Thursday NNP I-NP O .... 7

slide-12
SLIDE 12

Sequence to Sequence

We want to map a Sequence to another Sequence We have to keep the rank of the input tensor Use recurrent networks for sequences input shape: (b, 140) Embedding : (b, 140, 200) have to keep 140 and give a label to every word!

8

slide-13
SLIDE 13

Sequence to Sequence

We want to map a Sequence to another Sequence We have to keep the rank of the input tensor Use recurrent networks for sequences input shape: (b, 140) Embedding : (b, 140, 200) have to keep 140 and give a label to every word!

8

slide-14
SLIDE 14

Return Sequences

a sentence

  • f many

words

Wh

a sentence

  • f

many words

t t t t t

9

slide-15
SLIDE 15

Return Sequences

a sentence

  • f many

words

Wh

a sentence

  • f

many words

t t t t t

9

slide-16
SLIDE 16

Return Sequences

a sentence

  • f

many words

Wh

a sentence

  • f

many words

t t t t t

9

slide-17
SLIDE 17

Return Sequences

a sentence

  • f many

words

Wh

a sentence

  • f

many words

t t t t t

9

slide-18
SLIDE 18

Return Sequences

a sentence

  • f many

words

Wh

a sentence

  • f

many words

t t t t t

9

slide-19
SLIDE 19

Return Sequences

a sentence

  • f

many words a sentence

  • f

many words

t t t t t

10

slide-20
SLIDE 20

Left sided context

a sentence

  • f

many words a sentence

  • f

many words

t t t t t

11

slide-21
SLIDE 21

Bidirectional Recurrent Networks

a sentence

  • f

many words a sentence

  • f

many words a sentence a sentence

  • f
  • f

many words a sentence

  • f
  • f

many words many words

t t t t t t

12

slide-22
SLIDE 22

Bidirectional Recurrent Networks

a sentence

  • f

many words a sentence

  • f

many words a sentence a sentence

  • f
  • f

many words a sentence

  • f
  • f

many words many words

t t t t t t

12

slide-23
SLIDE 23

Bidirectional Recurrent Networks

a sentence

  • f

many words a sentence

  • f

many words a sentence a sentence

  • f
  • f

many words a sentence

  • f
  • f

many words many words

t t t t t t

12

slide-24
SLIDE 24

Bidirectional Recurrent Networks

a sentence

  • f

many words a sentence

  • f

many words a sentence a sentence

  • f
  • f

many words a sentence

  • f
  • f

many words many words

t t t t t t

12

slide-25
SLIDE 25

Bidirectional Recurrent Networks

a sentence

  • f

many words a sentence

  • f

many words a sentence a sentence

  • f
  • f

many words a sentence

  • f
  • f

many words many words

t t t t t t

12

slide-26
SLIDE 26

Bidictirectional LSTM

keras.layers.Bidirectional Advantages:

  • Captures long time dependencies in sentences
  • Considers left and right side context
  • Creates context dependent word representations

13

slide-27
SLIDE 27

Conditional Random Fields - CRF

A Conditional Random Field is a probabilistic model that take neighbouring observations into account. The Idea:

  • The labels are not independent of each other
  • B-PER cannot be followed by B-LOC
  • We try to consider transition probabilities

14

slide-28
SLIDE 28

Conditional Random Fields - CRF

O O B-PER ? name is Janos Borst

Emission Probabilities for Borst: S-LOC B-LOC E-Per (... 0.01 0.4 0.3 ...) Transition Probabilities for B-PER: S-LOC B-LOC E-Per (... 0.0 0.0 0.6 ...)

15

slide-29
SLIDE 29

Conditional Random Fields - CRF

O O B-PER ? name is Janos Borst

Emission Probabilities for Borst: S-LOC B-LOC E-Per (... 0.01 0.4 0.3 ...) Transition Probabilities for B-PER: S-LOC B-LOC E-Per (... 0.0 0.0 0.6 ...)

15

slide-30
SLIDE 30

Conditional Random Fields - CRF

O O B-PER ? name is Janos Borst

Emission Probabilities for Borst: S-LOC B-LOC E-Per (... 0.01 0.4 0.3 ...) Transition Probabilities for B-PER: S-LOC B-LOC E-Per (... 0.0 0.0 0.6 ...)

15

slide-31
SLIDE 31

Conditional Random Fields - CRF

O O B-PER E-PER name is Janos Borst

Emission Probabilities for Borst: S-LOC B-LOC E-Per (... 0.01 0.4 0.3 ...) Transition Probabilities for B-PER: S-LOC B-LOC E-Per (... 0.0 0.0 0.6 ...)

15

slide-32
SLIDE 32

Keras contrib

There is an extra library called keras_contrib

  • Implementing new layers, loss functions, activations
  • Works seamlessly with the keras modules
  • has a convenient CRF layer

16

slide-33
SLIDE 33

Code Example

import keras import keras_contrib as kc i = keras . layers . Input ( ( 1 4 0 , ) ) . . . lstm = . . . c r f = kc.layers.CRF( num_of_labels ) ( lstm ) model = keras . models . Model ( inputs = [ i ] ,

  • utputs

=[ c r f ] ) model . compile (

  • ptimizer = ”Adam” ,

loss = kc.losses.crf_loss , metrics = [kc.metrics.crf_viterbi_accuracy] ) 17

slide-34
SLIDE 34

Metrics

Accuracy is not meaningful here: 90% of all the labels are ”O” We need: Recall, Precision, F-Measure

18

slide-35
SLIDE 35

Recall

How many of the entities have I found? Recall = true positives true positives + false negatives How many of the detected entities are correctly classifjed? Precision = true positives true positives + false positives

19

slide-36
SLIDE 36

F1-Measure

The harmonic mean of recall and precision: F1 = 2 · precision · recall precision + recall (more details)

20

slide-37
SLIDE 37

The Architecture

word- sequence

Embedding BiLSTM CRF

CRF-Loss label- sequences

21

slide-38
SLIDE 38

showcase

Named Entity Tagger

22

slide-39
SLIDE 39

Let’s talk Flair again

  • Tag your entities.
  • generally...

23

slide-40
SLIDE 40

Applications

slide-41
SLIDE 41

Similar Tasks

  • Part-of-Speech, Chunking
  • Machine Translation (old languages)
  • Speech Recognition (Sound sequences to word sequences)

24