IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 - PowerPoint PPT Presentation

1 IN4080 – 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lønning

2 Tagging and sequence labeling Lecture 7, 28 Sept

Today 3  Tagged text and tag sets  Tagging as sequence labeling  HMM-tagging  Discriminative tagging  Neural sequence labeling

Tagged text and tagging 4 [('They', 'PRP'), ('saw', 'VBD'), ('a', 'DT'), ('saw', 'NN'), ('.', '.')] [('They', 'PRP'), ('like', 'VBP'), ('to', 'TO'), ('saw', 'VB'), ('.', '.')] [('They', 'PRP'), ('saw', 'VBD'), ('a', 'DT'), ('log', 'NN')]  In tagged text each token is assigned a “part of speech” (POS) tag  A tagger is a program which automatically ascribes tags to words in text  From the context we are (most often) able to determine the tag.  But some sentences are genuinely ambiguous and hence so are the tags.

Various POS tag sets 5  A tagged text is tagged according to a fixed small set of tags.  There are various such tag sets.  Brown tagset:  Original: 87 tags  Versions with extended tags <original>-<more>  Comes with the Brown corpus in NLTK  Penn treebank tags: 35+9 punctuation tags  Universal POS Tagset, 12 tags,

Universal POS tag set (NLTK) 6 Tag Meaning English Examples ADJ adjective new, good, high, special, big, local ADP adposition on, of, at, with, by, into, under ADV adverb really, already, still, early, now CONJ conjunction and, or , but, if, while, although DET determiner, article the, a, some, most, every, no, which NOUN noun year , home, costs, time, Africa NUM numeral twenty-four , fourth, 1991, 14:24 PRT particle at, on, out, over per , that, up, with PRON pronoun he, their , her , its, my, I, us VERB verb is, say, told, given, playing, would . punctuation marks . , ; ! X other ersatz, esprit, dunno, gr8, univeristy

Penn treebank tags 7

Original Brown tags, part 1 8

Different tagsets - example 11 Brown Penn Universal treebank (‘ wsj ’) he she PPS PRP PRON I PPSS PRP PRON me him her PPO PRP PRON my his her PP$ PRP$ DET mine his hers PP$$ ? PRON

Ambiguity rate 12

How ambiguous are tags (J&M, 2.ed) 13 BUT: Not directly comparable because of different tokenization

Back 14  earnings growth took a back/JJ seat  a small building in the back/NN  a clear majority of senators back/VBP the bill  Dave began to back/VB toward the door  enable the country to buy back/RP about debt  I was twenty-one back/RB then

Tagging as Sequence Classification 16  Classification (earlier):  a well-defined set of observations, O  a given set of classes, S={s 1 , s 2 , …, s k }  Goal: a classifier,  , a mapping from O to S  Sequence classification:  Goal: a classifier,  , a mapping from sequences of elements from O to sequences of elements from S:  𝛿(𝑝 1 , 𝑝 2 , … 𝑝 𝑜 ) = (𝑡 𝑙1 , 𝑡 𝑙2 , … 𝑡 𝑙𝑜 )

Baseline tagger 17  In all classification tasks establish a baseline classifier.  Compare the performance of other classifiers you make to the baseline.  For tagging, a natural baseline is the Most Frequent Class Baseline:  Assign each word the tag to which is occurred most frequent in the training set  For words unseen in the training set, assign the most frequent tag in the training set.

Hidden Markov Model (HMM) tagger 19 Extension of language model Extension of Naive Bayes  Two layers:  NB assigns a class to each observation  Observed: the sequence of words  An HMM is a sequence  Hidden: the tags/classes where classifier: each word is assigned a class It assigns a sequence of classes to a sequence of words

HMM is a probabilistic tagger Notation: 20 𝑜 = 𝑢 1 , 𝑢 2 , … 𝑢 𝑜 𝑜 = argmax 𝑜 |𝑥 1 𝑜 𝑢 1  The goal is to decide: Ƹ 𝑢 1 𝑄 𝑢 1 𝑜 𝑢 1 𝑜 𝑄 𝑢 1 𝑜 |𝑢 1 𝑜 𝑜 = argmax 𝑄 𝑥 1  Using Bayes theorem: Ƹ 𝑢 1 𝑜 𝑄 𝑥 1 𝑜 𝑢 1 𝑜 = argmax 𝑜 𝑄 𝑢 1 𝑜 |𝑢 1 𝑜  This simplifies to: Ƹ 𝑢 1 𝑄 𝑥 1 𝑜 𝑢 1 because the denominator is the same for all tag sequences

Simplifying assumption 2 22  Applying the chain rule 𝑜 𝑜 = ෑ 𝑜 |𝑢 1 𝑗−1 𝑢 1 𝑜 𝑄 𝑥 1 𝑄 𝑥 𝑗 |𝑥 1 𝑗=1 i.e., a word depends on all the tags and on all the preceding words 𝑜 ≈ 𝑄 𝑥 𝑗 |𝑢 𝑗 𝑗−1 𝑢 1  We make the simplifying assumption: 𝑄 𝑥 𝑗 |𝑥 1  i.e., a word depends only on the immediate tag, and hence 𝑜 𝑜 = ෑ 𝑜 |𝑢 1 𝑄 𝑥 1 𝑄 𝑥 𝑗 |𝑢 𝑗 𝑗=1

Training 24  From a tagged training corpus, we can estimate the probabilities with Maximum Likelihood (as in Language Models and Naïve Bayes:) 𝐷 𝑢 𝑗−1 ,𝑢 𝑗  ෠ 𝑄 𝑢 𝑗 𝑢 𝑗−1 = 𝐷 𝑢 𝑗−1 𝐷 𝑥 𝑗 ,𝑢 𝑗  ෠ 𝑄 𝑥 𝑗 𝑢 𝑗 = 𝐷 𝑢 𝑗

Putting it all together 25  From a trained model, it is straightforward to calculate the probability of a sentence with a tag sequence 𝑜 = 𝑄 𝑢 1 𝑜 𝑄 𝑥 1 𝑜 ≈ ς 𝑗=1 𝑜 , 𝑢 1 𝑜 |𝑢 1 𝑜 𝑜 𝑄 𝑢 𝑗 |𝑢 𝑗−1 ς 𝑗=1 𝑄 𝑥 1 𝑄 𝑥 𝑗 |𝑢 𝑗 𝑜 = ෑ 𝑄 𝑢 𝑗 |𝑢 𝑗−1 𝑄 𝑥 𝑗 |𝑢 𝑗 𝑗=1  To find the best tag sequence, we could – in principle – calculate this for all possible tag sequences and choose the one with highest score 𝑜 = argmax 𝑜 𝑄 𝑢 1 𝑜 |𝑢 1 𝑜  Ƹ 𝑢 1 𝑄 𝑥 1 𝑜 𝑢 1  Impossible in practice – There are too many

Possible tag sequences 26 Tag Tag Tag Tag Tag  The number of possible tag ADJ ADJ ADJ ADJ ADJ sequences = ADP ADP ADP ADP ADP ADV ADV ADV ADV ADV  The number of paths through CONJ CONJ CONJ CONJ CONJ the trellis = DET DET DET DET DET  𝑛 𝑜 NOUN NOUN NOUN NOUN NOUN NUM NUM NUM NUM NUM  m is the number of tags in the set PRT PRT PRT PRT PRT  n is the number of tokens in the PRON PRON PRON PRON PRON VERB VERB VERB VERB VERB sentence . . . . .  Here: 12 5 ≈ 250,000. X X X X X Janet will back the bill

Viterbi algorithm (dynamic programming) 27 Tag Tag Tag Tag Tag  Walk through the word sequence ADJ ADJ ADJ ADJ ADJ  For each word keep track of ADP ADP ADP ADP ADP ADV ADV ADV ADV ADV  all the possible tag sequences up to CONJ CONJ CONJ CONJ CONJ this word and the probability of DET DET DET DET DET each sequence NOUN NOUN NOUN NOUN NOUN  If two paths are equal from a NUM NUM NUM NUM NUM point on, then PRT PRT PRT PRT PRT PRON PRON PRON PRON PRON  The one scoring best at this point VERB VERB VERB VERB VERB will also score best at the end . . . . .  Discard the other one X X X X X Janet will back the bill

Viterbi algorithm 28  A nice example of dynamic programming  Skip the details:  Viterbi is covered in IN2110  We will use preprogrammed tools in this course – not implement ourselves  HMM is not state of the art taggers

HMM trigram tagger 29  Take two preceding tags into consideration 𝑜 ≈ ς 𝑗=1 𝑜  𝑄 𝑢 1 𝑄 𝑢 𝑗 |𝑢 𝑗−1 , 𝑢 𝑗−2  𝑜 𝑜 = ෑ 𝑜 , 𝑢 1 𝑄 𝑥 1 𝑄 𝑥 𝑗 |𝑢 𝑗 𝑄 𝑢 𝑗 |𝑢 𝑗−1 , 𝑢 𝑗−2 𝑗=1  Add two initial special states and one special end state

Challenges for the trigram tagger 30  More complex  We have probably not seen all tag trigrams during training  (𝑜 + 2) × 𝑛 3  We must use back-off or  𝑜 words in the sequence interpolation to lower n-grams  𝑛 tags in the model  (can also be necessary for  Example bigram tagger)  12 tags and 6 words: 15,552  With 45 tags: 820,125  With 87 tags: 5,926,527

Challenges for all (n-gram) taggers 31  How to tag words not seen  We will later on consider under training? discriminative taggers where morphological features may be  We assign them all the most added without changing the frequent tag ( noun ) model.  Or use the tag frequencies: 𝑄 𝑥 𝑢 = 𝑄(𝑢)  Better: use morphological features  Can be added as an extra module to an HMM-tagger

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 - PowerPoint PPT Presentation

1 IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Tagging and sequence labeling Lecture 7, 28 Sept Today 3 Tagged text and tag sets Tagging as sequence labeling HMM-tagging Discriminative tagging

Dialogue systems & chatbots Pierre Lison IN4080 : Natural Language Processing (Fall 2020)

Dialogue systems & chatbots Pierre Lison IN4080 : Natural Language Processing (Fall 2020)

Chatbot models, NLU & ASR Pierre Lison IN4080 : Natural Language Processing (Fall 2020)

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning Today 2 Part 1: Course

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Vectors, Distributions,

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Probabilities Tutorial,

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Neural networks, Language

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Words, text processing

Ethics in Natural Language Processing Pierre Lison IN4080 : Natural Language Processing (Fall

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Logistic Regression

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Neural LMs, Recurrent

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 IE: Relation extraction,

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning Looking at data 2 Data 3

Ethics in Natural Language Processing Pierre Lison IN4080 : Natural Language Processing (Fall

Dialogue management, system design & evaluation Pierre Lison IN4080 : Natural Language

Fall to Fall Enrollment Comparison Fall to Fall Enrollment Comparison Student FTE, Fall 2000

arXiv:cmp-lg/9708011 v1 19 Aug 1997

Writing P o etry Session 2 Sharing Homework (w/ partners) 3 Broad types of poetry Figurative

GCSE 2020 Aims of this evening To know the format of the exams that students will be sitting

Course Content Structural Programming and Data Structures Introduction Vectors

English Leaflets Monday 18 th May - What are these? Why do we have leaflets? What is their

SemEval-2013 Task 4: Free Paraphrases of Noun Compounds Iris Hendrickx, Zornitsa Kozareva,

Beyond the plain language edit: going the extra mile March 11, 2013 Claire Foley and Tracy

Multilingual Discourse Annotation Nianwen Xue 7/19/2011 LSA Summer Institute Acknowledgement:

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 - PowerPoint PPT Presentation

1 IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Tagging and sequence labeling Lecture 7, 28 Sept Today 3 Tagged text and tag sets Tagging as sequence labeling HMM-tagging Discriminative tagging

Dialogue systems &amp; chatbots Pierre Lison IN4080 : Natural Language Processing (Fall 2020)

Dialogue systems &amp; chatbots Pierre Lison IN4080 : Natural Language Processing (Fall 2020)

Chatbot models, NLU &amp; ASR Pierre Lison IN4080 : Natural Language Processing (Fall 2020)

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning Today 2 Part 1: Course

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Vectors, Distributions,

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Probabilities Tutorial,

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Neural networks, Language

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Words, text processing

Ethics in Natural Language Processing Pierre Lison IN4080 : Natural Language Processing (Fall

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Logistic Regression

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Neural LMs, Recurrent

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 IE: Relation extraction,

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning Looking at data 2 Data 3

Ethics in Natural Language Processing Pierre Lison IN4080 : Natural Language Processing (Fall

Dialogue management, system design &amp; evaluation Pierre Lison IN4080 : Natural Language

Fall to Fall Enrollment Comparison Fall to Fall Enrollment Comparison Student FTE, Fall 2000

arXiv:cmp-lg/9708011 v1 19 Aug 1997

Writing P o etry Session 2 Sharing Homework (w/ partners) 3 Broad types of poetry Figurative

GCSE 2020 Aims of this evening To know the format of the exams that students will be sitting

Course Content Structural Programming and Data Structures Introduction Vectors

English Leaflets Monday 18 th May - What are these? Why do we have leaflets? What is their

SemEval-2013 Task 4: Free Paraphrases of Noun Compounds Iris Hendrickx, Zornitsa Kozareva,

Beyond the plain language edit: going the extra mile March 11, 2013 Claire Foley and Tracy

Multilingual Discourse Annotation Nianwen Xue 7/19/2011 LSA Summer Institute Acknowledgement:

Dialogue systems & chatbots Pierre Lison IN4080 : Natural Language Processing (Fall 2020)

Dialogue systems & chatbots Pierre Lison IN4080 : Natural Language Processing (Fall 2020)

Chatbot models, NLU & ASR Pierre Lison IN4080 : Natural Language Processing (Fall 2020)

Dialogue management, system design & evaluation Pierre Lison IN4080 : Natural Language