A Supervised Sequence 2 Sequence Problem Janos Borst July 26, 2019 - PowerPoint PPT Presentation

A Supervised Sequence 2 Sequence Problem Janos Borst July 26, 2019 University of Leipzig - NLP Group

Sequence to Sequence l 2 PUNCT NOUN PRON PREP NOUN ART l 6 l 5 l 4 l 3 words l 1 many of sentence A 1 ( ) . ↓ ( ) ( )

Named Entity Tagging Named Entities ”An instance of a unique object with specifjc properties. (Person, Location, Product...)” 2

Example person ”Introduction to Neural Networks” is a workshop by Janos Borst. 3

Tags • LOC: Location • PER: Person • ORG: Organisation • MISC: Mixed (Events, works,...) 4

This is the BIOES scheme. (more info) Tagging Schemes E- • S-: Single token span • E-: End of a span • I-: Inside a span • B-: Beginning of an span Introducing tag prefjxes for spans: How do we know this is ”New York” or ”New” and ”York”? O LOC LOC This B- O O O . York New not is 5

Tagging Schemes E-LOC • S-: Single token span • E-: End of a span • I-: Inside a span • B-: Beginning of an span Introducing tag prefjxes for spans: How do we know this is ”New York” or ”New” and ”York”? O B-LOC This O O O . York New not is 5 This is the BIOES scheme. (more info)

The Requirements

Data Supervised Training data: CoNLL-2003 • Sequence to sequence tasks • contains: • Named Entities tags • Part-of-Speech tags • Phrasing tags 6

Data S-LOC O I-NP DT The 0 4 O I-NP CD 1996-08-22 1 3 I-NP 1 NNP BRUSSELS 0 3 E-PER I-NP NNP Blackburn 1 2 B-PER I-NP NNP 4 European 0 4 .... O I-NP NNP Thursday 5 4 O I-PP IN on 4 O NNP I-VP VBD said 3 4 E-ORG I-NP NNP Commission 2 4 B-ORG I-NP Peter 2 sentence_id VBZ NN call 3 1 S-LOC I-NP JJ German 2 1 O I-VP rejects O 1 1 S-ORG I-NP NNP EU 0 1 ner chunks pos token token_id I-NP 1 O I-NP O . . 8 1 O I-NP NN lamb 7 1 S-MISC JJ 4 British 6 1 O I-VP VB boycott 5 1 O I-VP TO to 7

Sequence to Sequence We want to map a Sequence to another Sequence We have to keep the rank of the input tensor Use recurrent networks for sequences input shape: (b, 140) Embedding : (b, 140, 200) have to keep 140 and give a label to every word! 8

Return Sequences many t t t t t words of a sentence a W h words of many sentence 9

Return Sequences many t t t t t words of a sentence a W h words many of sentence 9

Return Sequences many t t t t t words of a sentence a W h words of many sentence 9

Return Sequences many t t t t t words of a sentence a words many of sentence 10

Left sided context many t t t t t words of a sentence a words many of sentence 11

Bidirectional Recurrent Networks words t t t t t t words many words many of of sentence a many a of sentence of many words a sentence many of words a sentence a sentence of 12

Bidictirectional LSTM keras.layers.Bidirectional Advantages: • Captures long time dependencies in sentences • Considers left and right side context • Creates context dependent word representations 13

Conditional Random Fields - CRF A Conditional Random Field is a probabilistic model that take neighbouring observations into account. The Idea : • The labels are not independent of each other • B-PER cannot be followed by B-LOC • We try to consider transition probabilities 14

Conditional Random Fields - CRF 0.4 ...) 0.6 0.0 0.0 (... E-Per B-LOC S-LOC Transition Probabilities for B-PER: ...) 0.3 0.01 O (... E-Per B-LOC S-LOC Emission Probabilities for Borst: Borst Janos is name ? B-PER O 15

Conditional Random Fields - CRF 0.4 ...) 0.6 0.0 0.0 (... E-Per B-LOC S-LOC Transition Probabilities for B-PER: ...) 0.3 0.01 O (... E-Per B-LOC S-LOC Emission Probabilities for Borst: Borst Janos is name E-PER B-PER O 15

Keras contrib There is an extra library called keras_contrib • Implementing new layers, loss functions, activations • Works seamlessly with the keras modules • has a convenient CRF layer 16

Code Example inputs = [ i ] , ) metrics = [ kc.metrics.crf_viterbi_accuracy ] optimizer = ”Adam” , model . compile ( ) =[ c r f ] outputs model = keras . models . Model ( import keras c r f . . . lstm = . . . = keras . layers . Input ( ( 1 4 0 , ) ) i import keras_contrib as kc 17 = kc.layers.CRF ( num_of_labels ) ( lstm ) loss = kc.losses.crf_loss ,

Metrics Accuracy is not meaningful here: 90% of all the labels are ”O” We need: Recall, Precision, F-Measure 18

Recall How many of the entities have I found? true positives How many of the detected entities are correctly classifjed? true positives 19 Recall = true positives + false negatives Precision = true positives + false positives

F1-Measure The harmonic mean of recall and precision: 20 F 1 = 2 · precision · recall precision + recall (more details)

The Architecture word- sequence Embedding BiLSTM CRF CRF-Loss label- sequences 21

showcase Named Entity Tagger 22

Let’s talk Flair again • Tag your entities. • generally... 23

Applications

Similar Tasks • Part-of-Speech, Chunking • Machine Translation (old languages) • Speech Recognition (Sound sequences to word sequences) 24

A Supervised Sequence 2 Sequence Problem Janos Borst July 26, 2019 - PowerPoint PPT Presentation

A Supervised Sequence 2 Sequence Problem Janos Borst July 26, 2019 University of Leipzig - NLP Group Sequence to Sequence l 2 PUNCT NOUN PRON PREP NOUN ART l 6 l 5 l 4 l 3 words l 1 many of sentence A 1 ( ) . ( ) ( )

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Connectionist Temporal Classification 1 Sequence-to-sequence

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

SEQUENCE ANALYSIS The term " sequence analysis " in biology implies subjecting a DNA or

Sequence to Sequence models: Connectionist Temporal Classification 5 March 2018 1

THE SUPERVISED LEARNING PROBLEM THE SUPERVISED LEARNING PROBLEM Matthieu R Bloch January 7, 2020

Problem Definition Problem Definition Problem Definition Problem Definition Problem Definition

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1

61A Lecture 30 Announcements Efficient Sequence Processing Sequence Operations 4 Sequence

Introduction to sequence to sequence models N ATURAL LAN GUAGE GEN ERATION IN P YTH ON

Sequence-to-Sequence Learning with Neural Networks Ilya Sutskever, Oriol Vinyals, Quoc V. Le,

CS330 Paper Presentation: October 16th, 2019 Supervised Classification Semi-Supervised

Iterative Hybrid Algorithm for Semi-supervised Classification Martin SAVESKI Supervised by

Contracts Mergers Collaboration or Combination? ACOs Networks Integration or Collusion?

LIFE IN AMERICA AFTER WORLD WAR II Some experts worried that the postwar drop in industrial needs

academic perspective Helena Kennedy Centre for International Justice, Sheffield Hallam University

Surveying Catholic sisters in 1967 Julia Silge Data Scientist at Stack Overflow DataCamp

Disclaimer This presentation has been prepared by Nucleus Wealth and is for general information

Narendra Modis visit to the UK Meeting held at Sri Guru Singh Sabha, Southall 12-14 November

Objectives Review algorithms Programming in Python Data types Expressions

Transformer, External Memory Networks Milan Straka May 20, 2019 Charles University in Prague

A Supervised Sequence 2 Sequence Problem Janos Borst July 26, 2019 - PowerPoint PPT Presentation

A Supervised Sequence 2 Sequence Problem Janos Borst July 26, 2019 University of Leipzig - NLP Group Sequence to Sequence l 2 PUNCT NOUN PRON PREP NOUN ART l 6 l 5 l 4 l 3 words l 1 many of sentence A 1 ( ) . ( ) ( )

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Connectionist Temporal Classification 1 Sequence-to-sequence

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

SEQUENCE ANALYSIS The term &quot; sequence analysis &quot; in biology implies subjecting a DNA or

Sequence to Sequence models: Connectionist Temporal Classification 5 March 2018 1

THE SUPERVISED LEARNING PROBLEM THE SUPERVISED LEARNING PROBLEM Matthieu R Bloch January 7, 2020

Problem Definition Problem Definition Problem Definition Problem Definition Problem Definition

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1

61A Lecture 30 Announcements Efficient Sequence Processing Sequence Operations 4 Sequence

Introduction to sequence to sequence models N ATURAL LAN GUAGE GEN ERATION IN P YTH ON

Sequence-to-Sequence Learning with Neural Networks Ilya Sutskever, Oriol Vinyals, Quoc V. Le,

CS330 Paper Presentation: October 16th, 2019 Supervised Classification Semi-Supervised

Iterative Hybrid Algorithm for Semi-supervised Classification Martin SAVESKI Supervised by

Contracts Mergers Collaboration or Combination? ACOs Networks Integration or Collusion?

LIFE IN AMERICA AFTER WORLD WAR II Some experts worried that the postwar drop in industrial needs

academic perspective Helena Kennedy Centre for International Justice, Sheffield Hallam University

Surveying Catholic sisters in 1967 Julia Silge Data Scientist at Stack Overflow DataCamp

Disclaimer This presentation has been prepared by Nucleus Wealth and is for general information

Narendra Modis visit to the UK Meeting held at Sri Guru Singh Sabha, Southall 12-14 November

Objectives Review algorithms Programming in Python Data types Expressions

Transformer, External Memory Networks Milan Straka May 20, 2019 Charles University in Prague

SEQUENCE ANALYSIS The term " sequence analysis " in biology implies subjecting a DNA or