a supervised sequence 2 sequence problem
play

A Supervised Sequence 2 Sequence Problem Janos Borst July 26, 2019 - PowerPoint PPT Presentation

A Supervised Sequence 2 Sequence Problem Janos Borst July 26, 2019 University of Leipzig - NLP Group Sequence to Sequence l 2 PUNCT NOUN PRON PREP NOUN ART l 6 l 5 l 4 l 3 words l 1 many of sentence A 1 ( ) . ( ) ( )


  1. A Supervised Sequence 2 Sequence Problem Janos Borst July 26, 2019 University of Leipzig - NLP Group

  2. Sequence to Sequence l 2 PUNCT NOUN PRON PREP NOUN ART l 6 l 5 l 4 l 3 words l 1 many of sentence A 1 ( ) . ↓ ( ) ( )

  3. Named Entity Tagging Named Entities ”An instance of a unique object with specifjc properties. (Person, Location, Product...)” 2

  4. Example person ”Introduction to Neural Networks” is a workshop by Janos Borst. 3

  5. Example person ”Introduction to Neural Networks” is a workshop by Janos Borst. 3

  6. Tags • LOC: Location • PER: Person • ORG: Organisation • MISC: Mixed (Events, works,...) 4

  7. This is the BIOES scheme. (more info) Tagging Schemes E- • S-: Single token span • E-: End of a span • I-: Inside a span • B-: Beginning of an span Introducing tag prefjxes for spans: How do we know this is ”New York” or ”New” and ”York”? O LOC LOC This B- O O O . York New not is 5

  8. Tagging Schemes E-LOC • S-: Single token span • E-: End of a span • I-: Inside a span • B-: Beginning of an span Introducing tag prefjxes for spans: How do we know this is ”New York” or ”New” and ”York”? O B-LOC This O O O . York New not is 5 This is the BIOES scheme. (more info)

  9. The Requirements

  10. Data Supervised Training data: CoNLL-2003 • Sequence to sequence tasks • contains: • Named Entities tags • Part-of-Speech tags • Phrasing tags 6

  11. Data S-LOC O I-NP DT The 0 4 O I-NP CD 1996-08-22 1 3 I-NP 1 NNP BRUSSELS 0 3 E-PER I-NP NNP Blackburn 1 2 B-PER I-NP NNP 4 European 0 4 .... O I-NP NNP Thursday 5 4 O I-PP IN on 4 O NNP I-VP VBD said 3 4 E-ORG I-NP NNP Commission 2 4 B-ORG I-NP Peter 2 sentence_id VBZ NN call 3 1 S-LOC I-NP JJ German 2 1 O I-VP rejects O 1 1 S-ORG I-NP NNP EU 0 1 ner chunks pos token token_id I-NP 1 O I-NP O . . 8 1 O I-NP NN lamb 7 1 S-MISC JJ 4 British 6 1 O I-VP VB boycott 5 1 O I-VP TO to 7

  12. Sequence to Sequence We want to map a Sequence to another Sequence We have to keep the rank of the input tensor Use recurrent networks for sequences input shape: (b, 140) Embedding : (b, 140, 200) have to keep 140 and give a label to every word! 8

  13. Sequence to Sequence We want to map a Sequence to another Sequence We have to keep the rank of the input tensor Use recurrent networks for sequences input shape: (b, 140) Embedding : (b, 140, 200) have to keep 140 and give a label to every word! 8

  14. Return Sequences many t t t t t words of a sentence a W h words of many sentence 9

  15. Return Sequences many t t t t t words of a sentence a W h words of many sentence 9

  16. Return Sequences many t t t t t words of a sentence a W h words many of sentence 9

  17. Return Sequences many t t t t t words of a sentence a W h words of many sentence 9

  18. Return Sequences many t t t t t words of a sentence a W h words of many sentence 9

  19. Return Sequences many t t t t t words of a sentence a words many of sentence 10

  20. Left sided context many t t t t t words of a sentence a words many of sentence 11

  21. Bidirectional Recurrent Networks words t t t t t t words many words many of of sentence a many a of sentence of many words a sentence many of words a sentence a sentence of 12

  22. Bidirectional Recurrent Networks words t t t t t t words many words many of of sentence a many a of sentence of many words a sentence many of words a sentence a sentence of 12

  23. Bidirectional Recurrent Networks words t t t t t t words many words many of of sentence a many a of sentence of many words a sentence many of words a sentence a sentence of 12

  24. Bidirectional Recurrent Networks words t t t t t t words many words many of of sentence a many a of sentence of many words a sentence many of words a sentence a sentence of 12

  25. Bidirectional Recurrent Networks words t t t t t t words many words many of of sentence a many a of sentence of many words a sentence many of words a sentence a sentence of 12

  26. Bidictirectional LSTM keras.layers.Bidirectional Advantages: • Captures long time dependencies in sentences • Considers left and right side context • Creates context dependent word representations 13

  27. Conditional Random Fields - CRF A Conditional Random Field is a probabilistic model that take neighbouring observations into account. The Idea : • The labels are not independent of each other • B-PER cannot be followed by B-LOC • We try to consider transition probabilities 14

  28. Conditional Random Fields - CRF 0.4 ...) 0.6 0.0 0.0 (... E-Per B-LOC S-LOC Transition Probabilities for B-PER: ...) 0.3 0.01 O (... E-Per B-LOC S-LOC Emission Probabilities for Borst: Borst Janos is name ? B-PER O 15

  29. Conditional Random Fields - CRF 0.4 ...) 0.6 0.0 0.0 (... E-Per B-LOC S-LOC Transition Probabilities for B-PER: ...) 0.3 0.01 O (... E-Per B-LOC S-LOC Emission Probabilities for Borst: Borst Janos is name ? B-PER O 15

  30. Conditional Random Fields - CRF 0.4 ...) 0.6 0.0 0.0 (... E-Per B-LOC S-LOC Transition Probabilities for B-PER: ...) 0.3 0.01 O (... E-Per B-LOC S-LOC Emission Probabilities for Borst: Borst Janos is name ? B-PER O 15

  31. Conditional Random Fields - CRF 0.4 ...) 0.6 0.0 0.0 (... E-Per B-LOC S-LOC Transition Probabilities for B-PER: ...) 0.3 0.01 O (... E-Per B-LOC S-LOC Emission Probabilities for Borst: Borst Janos is name E-PER B-PER O 15

  32. Keras contrib There is an extra library called keras_contrib • Implementing new layers, loss functions, activations • Works seamlessly with the keras modules • has a convenient CRF layer 16

  33. Code Example inputs = [ i ] , ) metrics = [ kc.metrics.crf_viterbi_accuracy ] optimizer = ”Adam” , model . compile ( ) =[ c r f ] outputs model = keras . models . Model ( import keras c r f . . . lstm = . . . = keras . layers . Input ( ( 1 4 0 , ) ) i import keras_contrib as kc 17 = kc.layers.CRF ( num_of_labels ) ( lstm ) loss = kc.losses.crf_loss ,

  34. Metrics Accuracy is not meaningful here: 90% of all the labels are ”O” We need: Recall, Precision, F-Measure 18

  35. Recall How many of the entities have I found? true positives How many of the detected entities are correctly classifjed? true positives 19 Recall = true positives + false negatives Precision = true positives + false positives

  36. F1-Measure The harmonic mean of recall and precision: 20 F 1 = 2 · precision · recall precision + recall (more details)

  37. The Architecture word- sequence Embedding BiLSTM CRF CRF-Loss label- sequences 21

  38. showcase Named Entity Tagger 22

  39. Let’s talk Flair again • Tag your entities. • generally... 23

  40. Applications

  41. Similar Tasks • Part-of-Speech, Chunking • Machine Translation (old languages) • Speech Recognition (Sound sequences to word sequences) 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend