NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden - PowerPoint PPT Presentation

NLP Programming Tutorial 5 – POS Tagging with HMMs NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models Graham Neubig Nara Institute of Science and Technology (NAIST) 1

NLP Programming Tutorial 5 – POS Tagging with HMMs Part of Speech (POS) Tagging ● Given a sentence X, predict its part of speech sequence Y Natural language processing ( NLP ) is a field of computer science JJ NN NN -LRB- NN -RRB- VBZ DT NN IN NN NN ● A type of “structured” prediction, from two weeks ago ● How can we do this? Any ideas? 2

NLP Programming Tutorial 5 – POS Tagging with HMMs Many Answers! ● Pointwise prediction: predict each word individually with a classifier (e.g. perceptron, tool: KyTea) Natural language processing ( NLP ) is a field of computer science classifier classifier “processing” = NN? VBG? JJ? “computer” = NN? VBG? JJ? ● Generative sequence models: todays topic! (e.g. Hidden Markov Model, tool: ChaSen) ● Discriminative sequence models: predict whole sequence with a classifier (e.g. CRF, structured perceptron, tool: MeCab, Stanford Tagger) 3

NLP Programming Tutorial 5 – POS Tagging with HMMs Probabilistic Model for Tagging ● “Find the most probable tag sequence, given the sentence” Natural language processing ( NLP ) is a field of computer science JJ NN NN LRB NN RRB VBZ DT NN IN NN NN P ( Y ∣ X ) argmax Y ● Any ideas? 4

NLP Programming Tutorial 5 – POS Tagging with HMMs Generative Sequence Model ● First decompose probability using Bayes' law P ( X ∣ Y ) P ( Y ) P ( Y ∣ X )= argmax argmax P ( X ) Y Y = argmax P ( X ∣ Y ) P ( Y ) Y Model of word/POS interactions Model of POS/POS interactions “natural” is probably a JJ NN comes after DET ● Also sometimes called the “noisy-channel model” 5

NLP Programming Tutorial 5 – POS Tagging with HMMs Hidden Markov Models 6

NLP Programming Tutorial 5 – POS Tagging with HMMs Hidden Markov Models (HMMs) for POS Tagging ● POS→POS transition probabilities I + 1 P ( Y )≈ ∏ i = 1 ● Like a bigram model! P T ( y i ∣ y i − 1 ) ● POS→Word emission probabilities I P ( X ∣ Y )≈ ∏ 1 P E ( x i ∣ y i ) P T (JJ|<s>) P T (NN|JJ) P T (NN|NN) … * * <s> JJ NN NN LRB NN RRB ... </s> natural language processing ( nlp ) ... P E (natural|JJ) P E (language|NN) P E (processing|NN) * * … 7

NLP Programming Tutorial 5 – POS Tagging with HMMs Learning Markov Models (with tags) ● Count the number of occurrences in the corpus and natural language processing ( nlp ) is … … c(JJ→natural)++ c(NN→language)++ <s> JJ NN NN LRB NN RRB VB … </s> … c(<s> JJ)++ c(JJ NN)++ ● Divide by context to get probability P T (LRB|NN) = c(NN LRB)/c(NN) = 1/3 P E (language|NN) = c(NN → language)/c(NN) = 1/3 8

NLP Programming Tutorial 5 – POS Tagging with HMMs Training Algorithm # Input data format is “natural_JJ language_NN …” make a map emit, transition, context for each line in file previous = “<s>” # Make the sentence start context [ previous ]++ split line into wordtags with “ “ for each wordtag in wordtags split wordtag into word, tag with “_” transition [ previous +“ “+ tag ]++ # Count the transition context [ tag ]++ # Count the context emit [ tag +“ “+ word ]++ # Count the emission previous = tag transition [ previous +” </s>”]++ # Print the transition probabilities for each key, value in transition split key into previous, word with “ “ print “T”, key , value / context [ previous ] 9 # Do the same thing for emission probabilities with “E”

NLP Programming Tutorial 5 – POS Tagging with HMMs Note: Smoothing ● In bigram model, we smoothed probabilities P LM (w i |w i-1 ) = λ P ML (w i |w i-1 ) + (1-λ) P LM (w i ) ● HMM transition prob.: there are not many tags, so smoothing is not necessary P T (y i |y i-1 ) = P ML (y i |y i-1 ) ● HMM emission prob.: smooth for unknown words P E (x i |y i ) = λ P ML (x i |y i ) + (1-λ) 1/N 10

NLP Programming Tutorial 5 – POS Tagging with HMMs Finding POS Tags 11

NLP Programming Tutorial 5 – POS Tagging with HMMs Finding POS Tags with Markov Models ● Use the Viterbi algorithm again!! I told you I was important!! ● What does our graph look like? 12

NLP Programming Tutorial 5 – POS Tagging with HMMs Finding POS Tags with Markov Models ● What does our graph look like? Answer: natural language processing ( nlp ) 1:NN 2:NN 3:NN 4:NN 5:NN 6:NN 0:<S> 1:JJ 2:JJ 3:JJ 4:JJ 5:JJ 6:JJ 1:VB 2:VB 3:VB 4:VB 5:VB 6:VB … 1:LRB 2:LRB 3:LRB 4:LRB 5:LRB 6:LRB 1:RRB 2:RRB 3:RRB 4:RRB 5:RRB 6:RRB … … … … … … 13

NLP Programming Tutorial 5 – POS Tagging with HMMs Finding POS Tags with Markov Models ● The best path is our POS sequence natural language processing ( nlp ) 1:NN 2:NN 3:NN 4:NN 5:NN 6:NN 0:<S> 1:JJ 2:JJ 3:JJ 4:JJ 5:JJ 6:JJ 1:VB 2:VB 3:VB 4:VB 5:VB 6:VB … 1:LRB 2:LRB 3:LRB 4:LRB 5:LRB 6:LRB 1:RRB 2:RRB 3:RRB 4:RRB 5:RRB 6:RRB … … … … … … 14 <s> JJ NN NN LRB NN RRB

NLP Programming Tutorial 5 – POS Tagging with HMMs Remember: Viterbi Algorithm Steps ● Forward step, calculate the best path to a node ● Find the path to each node with the lowest negative log probability ● Backward step, reproduce the path ● This is easy, almost the same as word segmentation 15

NLP Programming Tutorial 5 – POS Tagging with HMMs Forward Step: Part 1 ● First, calculate transition from <S> and emission of the first word for every POS natural 1:NN 0:<S> best_score[“1 NN”] = -log P T (NN|<S>) + -log P E (natural | NN) 1:JJ best_score[“1 JJ”] = -log P T (JJ|<S>) + -log P E (natural | JJ) 1:VB best_score[“1 VB”] = -log P T (VB|<S>) + -log P E (natural | VB) 1:LRB best_score[“1 LRB”] = -log P T (LRB|<S>) + -log P E (natural | LRB) 1:RRB best_score[“1 RRB”] = -log P T (RRB|<S>) + -log P E (natural | RRB) … 16

NLP Programming Tutorial 5 – POS Tagging with HMMs Forward Step: Middle Parts ● For middle words, calculate the minimum score for all possible previous POS tags natural language best_score[“2 NN”] = min( 1:NN 2:NN best_score[“1 NN”] + -log P T (NN|NN) + -log P E (language | NN), best_score[“1 JJ”] + -log P T (NN|JJ) + -log P E (language | NN), 1:JJ 2:JJ best_score[“1 VB”] + -log P T (NN|VB) + -log P E (language | NN), best_score[“1 LRB”] + -log P T (NN|LRB) + -log P E (language | NN), 1:VB 2:VB best_score[“1 RRB”] + -log P T (NN|RRB) + -log P E (language | NN), ... ) 1:LRB 2:LRB best_score[“2 JJ”] = min( best_score[“1 NN”] + -log P T (JJ|NN) + -log P E (language | JJ), 1:RRB 2:RRB best_score[“1 JJ”] + -log P T (JJ|JJ) + -log P E (language | JJ), … … best_score[“1 VB”] + -log P T (JJ|VB) + -log P E (language | JJ), 17 ...

NLP Programming Tutorial 5 – POS Tagging with HMMs Forward Step: Final Part ● Finish up the sentence with the sentence final symbol science best_score[“ I+1 </S>”] = min( I :NN I+1 :</S> best_score[“ I NN”] + -log P T (</S>|NN), best_score[“ I JJ”] + -log P T (</S>|JJ), I :JJ best_score[“ I VB”] + -log P T (</S>|VB), best_score[“ I LRB”] + -log P T (</S>|LRB), I :VB best_score[“ I NN”] + -log P T (</S>|RRB), ... I :LRB ) I :RRB … 18

NLP Programming Tutorial 5 – POS Tagging with HMMs Implementation: Model Loading make a map for transition, emission, possible_tags for each line in model_file split line into type, context, word, prob possible_tags [context] = 1 # We use this to # enumerate all tags if type = “T” transition [“ context word ”] = prob else emission [“ context word ”] = prob 19

NLP Programming Tutorial 5 – POS Tagging with HMMs Implementation: Forward Step split line into words I = length (words) make maps best_score, best_edge best_score [“0 <s>”] = 0 # Start with <s> best_edge [“0 <s>”] = NULL for i in 0 … I -1: for each prev in keys of possible_tags for each next in keys of possible_tags if best_score[“ i prev ”] and transition[“prev next”] exist score = best_score[“i prev”] + -log P T (next|prev) + -log P E (word[i]|next) if best_score [“ i+1 next ”] is new or > score best_score [“ i+1 next ”] = score best_edge [“ i+1 next ”] = “ i prev ” # Finally, do the same for </s> 20

NLP Programming Tutorial 5 – POS Tagging with HMMs Implementation: Backward Step tags = [ ] next_edge = best_edge [ “I </s> ” ] while next_edge != “0 <s>” # Add the substring for this edge to the words split next_edge into position, tag append tag to tags next_edge = best_edge [ next_edge ] tags .reverse() join tags into a string and print 21

NLP Programming Tutorial 5 – POS Tagging with HMMs Exercise 22

NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden - PowerPoint PPT Presentation

NLP Programming Tutorial 5 POS Tagging with HMMs NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models Graham Neubig Nara Institute of Science and Technology (NAIST) 1 NLP Programming Tutorial 5 POS Tagging

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 References: 1. Speech and

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2005 References: 1. Speech and

Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics

NLP Programming Tutorial 0 - Programming Basics Graham Neubig Nara Institute of Science and

POS Tagging HMMs L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 1 / 17 POS

Part of Speech Tagging Informatics 2A: Lecture 16 John Longley School of Informatics University

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

NLP Programming Tutorial 4 - Word Segmentation Graham Neubig Nara Institute of Science and

NLP: Two pictures Wordnet and Word Sense Problem NLP Disambiguation Semantics NLP Trinity

Part Of Speech (POS) Tagging Based on Foundations of Statistical NLP by C. Manning & H.

NLP Programming Tutorial 2 - Bigram Language Models Graham Neubig Nara Institute of Science and

NLP Programming Tutorial 1 - Unigram Language Models Graham Neubig Nara Institute of Science and

NLP Programming Tutorial 8 - Phrase Structure Parsing Graham Neubig Nara Institute of Science

NLP Programming Tutorial 12 - Dependency Parsing Graham Neubig Nara Institute of Science and

NLP Programming Tutorial 6 - Advanced Discriminative Learning Graham Neubig Nara Institute of

NLP Programming Tutorial 8 - Recurrent Neural Nets Graham Neubig Nara Institute of Science and

scandal John German University of Michigan Conference Transportation Economics, Energy and the

Hidden Markov Models II Machine Learning 10-601B Seyoung

Introduction to time-resolved spectroscopy With applications in biophysics and physical chemistry

Approach Worked with Alaska Native Coalition on Employment and Training (ANCET) members to

GHG Regulation Impact Analysis Initial Study Results September 17, 2014 The purpose of

Wavelength Optimization Review ELM; January 26, 2004 Optimization of a red fluorophore with a

Carbon leakage: theory, evidence and policy PMR Webinar on Carbon Leakage John Ward November 24

Blackbody Radiation Blackbody Radiation A blackbody is a surface that completely absorbs all

NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden - PowerPoint PPT Presentation

NLP Programming Tutorial 5 POS Tagging with HMMs NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models Graham Neubig Nara Institute of Science and Technology (NAIST) 1 NLP Programming Tutorial 5 POS Tagging

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 References: 1. Speech and

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2005 References: 1. Speech and

Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics

NLP Programming Tutorial 0 - Programming Basics Graham Neubig Nara Institute of Science and

POS Tagging HMMs L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 1 / 17 POS

Part of Speech Tagging Informatics 2A: Lecture 16 John Longley School of Informatics University

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

NLP Programming Tutorial 4 - Word Segmentation Graham Neubig Nara Institute of Science and

NLP: Two pictures Wordnet and Word Sense Problem NLP Disambiguation Semantics NLP Trinity

Part Of Speech (POS) Tagging Based on Foundations of Statistical NLP by C. Manning &amp; H.

NLP Programming Tutorial 2 - Bigram Language Models Graham Neubig Nara Institute of Science and

NLP Programming Tutorial 1 - Unigram Language Models Graham Neubig Nara Institute of Science and

NLP Programming Tutorial 8 - Phrase Structure Parsing Graham Neubig Nara Institute of Science

NLP Programming Tutorial 12 - Dependency Parsing Graham Neubig Nara Institute of Science and

NLP Programming Tutorial 6 - Advanced Discriminative Learning Graham Neubig Nara Institute of

NLP Programming Tutorial 8 - Recurrent Neural Nets Graham Neubig Nara Institute of Science and

scandal John German University of Michigan Conference Transportation Economics, Energy and the

Hidden Markov Models II Machine Learning 10-601B Seyoung

Introduction to time-resolved spectroscopy With applications in biophysics and physical chemistry

Approach Worked with Alaska Native Coalition on Employment and Training (ANCET) members to

GHG Regulation Impact Analysis Initial Study Results September 17, 2014 The purpose of

Wavelength Optimization Review ELM; January 26, 2004 Optimization of a red fluorophore with a

Carbon leakage: theory, evidence and policy PMR Webinar on Carbon Leakage John Ward November 24

Blackbody Radiation Blackbody Radiation A blackbody is a surface that completely absorbs all

Part Of Speech (POS) Tagging Based on Foundations of Statistical NLP by C. Manning & H.