Natural Language Processing Info 159/259 Lecture 10: Sequence - PowerPoint PPT Presentation

Natural Language Processing Info 159/259   Lecture 10: Sequence Labeling 1 (Sept 26, 2017) David Bamman, UC Berkeley

POS tagging Labeling the tag that’s correct for the context. NNP IN FW IN JJ SYM JJ VBZ VB LS VB VBZ NN NN VBP DT NN NN VBP NN DT NN Fruit flies like a banana Time flies like an arrow (Just tags in evidence within the Penn Treebank — more are possible!)

Named entity recognition PERS PERS ORG tim cook is the ceo of apple person • person • location • location • organization • 3 or 4-class: 7-class: organization • time • (misc) • money • percent • date •

Supersense tagging artifact artifact motion time group The station wagons arrived at noon, a long shining line motion location location that coursed through the west campus. Noun supersenses (Ciarmita and Altun 2003)

Book segmentation

Sequence labeling x = { x 1 , . . . , x n } y = { y 1 , . . . , y n } • For a set of inputs x with n sequential time steps, one corresponding label y i for each x i

Majority class • Pick the label each word is seen most often with in the training data fruit flies like a banana NN 12 VBZ 7 VB 74 FW 8 NN 3 NNS 1 VBP 31 SYM 13 JJ 28 LS 2 IN 533 JJ 2 IN 1 DT 25820 NNP 2

Logistic regression • Treat each prediction as independent of the others but condition on much more expressive set of features � x � β y � exp P ( y | x ; β ) = � y � � Y exp ( x � β y � ) � x � β VBZ � exp P (VBZ | flies ) = � y � � Y exp ( x � β y � )

Discriminative Features feature example Features are scoped over x i = flies 1 entire observed input xi = car 0 x i-1 = fruit 1 Fruit flies like a banana x i+1 = like 1

Sequences • Models that make independent predictions for elements in a sequence can reason over expressive representations of the input x (including correlations among inputs at different time steps x i and x j . • But they don’t capture another important source of information: correlations in the labels y.

Sequences IN JJ VB NN VBZ VBP Time flies like an arrow

DT NN 41909 Sequences NNP NNP 37696 NN IN 35458 IN DT 35006 JJ NN 29699 DT JJ 19166 NN NN 17484 Most common tag bigrams in NN , 16352 Penn Treebank training IN NNP 15940 NN . 15548 JJ NNS 15297 NNS IN 15146 TO VB 13797 NNP , 13683 IN NN 11565

Sequences x time flies like an arrow y NN VBZ IN DT NN P ( y = NN VBZ IN DT NN | x = time flies like an arrow)

Generative vs. Discriminative models • Generative models specify a joint distribution over the labels and the data. With this you could generate new data P ( x , y ) = P ( y ) P ( x | y ) • Discriminative models specify the conditional distribution of the label y given the data x. These models focus on how to discriminate between the classes P ( y | x )

Hidden Markov Model P ( y ) = P ( y 1 , . . . , y n ) Prior probability of label sequence n +1 � P ( y 1 , . . . , y n ) ≈ P ( y i | y i − 1 ) i =1 • We’ll make a first-order Markov assumption and calculate the joint probability as the product the individual factors conditioned only on the previous tag.

Hidden Markov Model P ( y i , . . . , y n ) = P ( y 1 ) × P ( y 2 | y 1 ) × P ( y 3 | y 1 , y 2 ) . . . × P ( y n | y 1 , . . . , y n − 1 ) • Remember: a Markov assumption is an approximation to this exact decomposition (the chain rule of probability)

Hidden Markov Model P ( x | y ) = P ( x 1 , . . . , x n | y 1 , . . . , y n ) N � P ( x 1 , . . . , x n | y 1 , . . . , y n ) ≈ P ( x i | y i ) i =1 • Here again we’ll make a strong assumption: the probability of the word we see at a given time step is only dependent on its label

NNP VBZ NN VBZ is 1121 is 2893 has 854 has 1004 says 420 does 128 does 77 says 109 plans 50 remains 56 expects 47 ‘s 51 ‘s 40 includes 44 wants 31 continues 43 owns 30 makes 40 makes 29 seems 34 hopes 24 comes 33 remains 24 reflects 31 claims 19 calls 30 seems 19 expects 29 estimates 17 goes 27 P ( x i | y i , y i − 1 )

HMM n +1 n � � P ( x 1 , . . . , x n , y 1 , . . . , y n ) ≈ P ( y i | y i − 1 ) P ( x i | y i ) i =1 i =1

HMM P ( y 3 | y 2 ) y 1 y 2 y 3 y 4 y 5 y 6 y 7 x 1 x 2 x 3 x 4 x 5 x 6 x 7 P ( x 3 | y 3 )

HMM P ( V B | NNP ) NNP NNP VB RB DT JJ NN Mr. was not a sensible man Collins P ( was | V B )

Parameter estimation c ( y 1 , y 2 ) P ( y t | y t − 1 ) c ( y 1 ) MLE for both is just counting   (as in Naive Bayes) c ( x, y ) P ( x t | y t ) c ( y )

Transition probabilities

Emission probabilities

Smoothing • One solution: add a little probability mass to every element. smoothed estimates maximum likelihood estimate P ( x i | y ) = n i , y + α n y + Vα P ( x i | y ) = n i , y n y same α for all x i n i , y + α i P ( x i | y ) = n i,y = count of word i in class y n y + � V j = 1 α j n y = number of words in y V = size of vocabulary possibly different α for each x i

Decoding • Greedy: proceed left to right, committing to the best tag for each time step (given the sequence seen so far) Fruit flies like a banana NN VB IN DT NN

Decoding DT NN VBD IN DT NN ??? The horse raced past the barn fell

Decoding DT NN VBD IN DT NN ??? The horse raced past the barn fell DT NN VBN IN DT NN VBD Information later on in the sentence can influence the best tags earlier on.

All paths END DT NNP VB NN MD START ^ Janet will back the bill $ Ideally, what we want is to calculate the joint probability of each path and pick the one with the highest probability. But for N time steps and K labels, number of possible paths = K N

5 word sentence with 45 Penn Treebank tags 45 5 = 184,528,125 different paths 45 20 = 1.16e33 different paths

Viterbi algorithm • Basic idea: if an optimal path through a sequence uses label L at time T, then it must have used an optimal path to get to label L at time T • We can discard all non-optimal paths up to label L at time T

END DT NNP VB NN MD START ^ Janet will back the bill $ • At each time step t ending in label K, we find the max probability of any path that led to that state

v T ( END ) END DT v 1 (DT) NNP v 1 (NNP) VB v 1 (VB) NN v 1 (NN) MD v 1 (MD) START Janet will back the bill What’s the HMM probability of ending in Janet = NNP? P ( y t | y t − 1 ) P ( x t | y t ) P (NNP | START) P ( Janet | NNP)

v T ( END ) END DT v 1 (DT) NNP v 1 (NNP) Best path through time step 1 VB v 1 (VB) ending in tag y (trivially - best NN v 1 (NN) path for all is just START) MD v 1 (MD) START Janet will back the bill v 1 ( y ) = max u ∈ Y [ P ( y t = y | y t − 1 = u ) P ( x t | y t = y )]

v T ( END ) END DT v 1 (DT) v 2 (DT) NNP v 1 (NNP) v 2 (NNP) VB v 1 (VB) v 2 (VB) NN v 1 (NN) v 2 (NN) MD v 1 (MD) v 2 (MD) START Janet will back the bill What’s the max HMM probability of ending in will = MD? First, what’s the HMM probability of a single path ending in will = MD?

v T ( END ) END DT v 1 (DT) v 2 (DT) NNP v 1 (NNP) v 2 (NNP) VB v 1 (VB) v 2 (VB) NN v 1 (NN) v 2 (NN) MD v 1 (MD) v 2 (MD) START Janet will back the bill P ( y 1 | START ) P ( x 1 | y 1 ) × P ( y 2 = MD | y 1 ) P ( x 2 | y 2 = MD)

v T ( END ) END DT v 1 (DT) v 2 (DT) NNP v 1 (NNP) v 2 (NNP) VB Best path through time step 2 v 1 (VB) v 2 (VB) ending in tag MD NN v 1 (NN) v 2 (NN) MD v 1 (MD) v 2 (MD) START Janet will back the bill Let’s say the best path ending will = MD includes Janet = NNP. By definition, every other path has lower probability.

Natural Language Processing Info 159/259 Lecture 10: Sequence - PowerPoint PPT Presentation

Natural Language Processing Info 159/259 Lecture 10: Sequence Labeling 1 (Sept 26, 2017) David Bamman, UC Berkeley POS tagging Labeling the tag thats correct for the context. NNP IN FW IN JJ SYM JJ VBZ VB LS VB VBZ NN NN

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 8: Compositional semantics and discourse processing Katia

Natural Language Processing Fall 2018 Frank Ferraro Natural language processing ITE 358

Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2019 Natural Language

MIA - Master on Artificial Intelligence Advanced Natural Language Processing Advanced Natural

Advanced Natural Language Processing: What is Natural Language Processing (NLP)? Background

Introduction Karl Stratos Rutgers University Karl Stratos CS 533: Natural Language Processing

Outline of todays lecture Overview of Natural Language Generation Components of Natural

Introduction to Natural Language Processing CMSC 470 Marine Carpuat Natural Language Processing

An Approach for the Automated Detection of XSS Vulnerabilities in Web Templates Sebastian Stigler

Lets make HTML5 & JavaScript Games! credit: w3.or g HTML is a markup lan g ua g e Markup

Well-formed XML Documents Asst. Prof. Dr. Kanda Runapongsa Saikaew (krunapon@kku.ac.th) Dept.

Android Pre-requisites 1 Android To Dos Make sure you have working

HTML 1 CS 337 - Web Programming University of Arizona - Spring 2014 Day 1 Review There are four

Browsers Web-Big Picture The Client Any software that is capable of issuing HTTP requests

Time-Lord Techniques A lost set of timing techniques. Called time-lord systems (Greek:

CreateWorld 2012 ! 5-7 December Griffith University Brisbane ! Presentation Title ! Using

Natural Language Processing Info 159/259 Lecture 10: Sequence - PowerPoint PPT Presentation

Natural Language Processing Info 159/259 Lecture 10: Sequence Labeling 1 (Sept 26, 2017) David Bamman, UC Berkeley POS tagging Labeling the tag thats correct for the context. NNP IN FW IN JJ SYM JJ VBZ VB LS VB VBZ NN NN

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 8: Compositional semantics and discourse processing Katia

Natural Language Processing Fall 2018 Frank Ferraro Natural language processing ITE 358

Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2019 Natural Language

MIA - Master on Artificial Intelligence Advanced Natural Language Processing Advanced Natural

Advanced Natural Language Processing: What is Natural Language Processing (NLP)? Background

Introduction Karl Stratos Rutgers University Karl Stratos CS 533: Natural Language Processing

Outline of todays lecture Overview of Natural Language Generation Components of Natural

Introduction to Natural Language Processing CMSC 470 Marine Carpuat Natural Language Processing

An Approach for the Automated Detection of XSS Vulnerabilities in Web Templates Sebastian Stigler

Lets make HTML5 &amp; JavaScript Games! credit: w3.or g HTML is a markup lan g ua g e Markup

Well-formed XML Documents Asst. Prof. Dr. Kanda Runapongsa Saikaew (krunapon@kku.ac.th) Dept.

Android Pre-requisites 1 Android To Dos Make sure you have working

HTML 1 CS 337 - Web Programming University of Arizona - Spring 2014 Day 1 Review There are four

Browsers Web-Big Picture The Client Any software that is capable of issuing HTTP requests

Time-Lord Techniques A lost set of timing techniques. Called time-lord systems (Greek:

CreateWorld 2012 ! 5-7 December Griffith University Brisbane ! Presentation Title ! Using

Lets make HTML5 & JavaScript Games! credit: w3.or g HTML is a markup lan g ua g e Markup