A Maximum Entropy Model for Part-of-Speech Introduction Tagging - - PowerPoint PPT Presentation

a maximum entropy model for part of speech
SMART_READER_LITE
LIVE PREVIEW

A Maximum Entropy Model for Part-of-Speech Introduction Tagging - - PowerPoint PPT Presentation

Mawulolo Ameko and Sonia Baee A Maximum Entropy Model for Part-of-Speech Introduction Tagging The probability model Adwait Ratnaparkhi, 1996 Features for POS tagging Testing the Model Mawulolo Ameko and Sonia Baee Error Analysis


slide-1
SLIDE 1

Mawulolo Ameko and Sonia Baee Introduction The probability model Features for POS tagging Testing the Model Error Analysis Comparison with previous work Conclusion Question

A Maximum Entropy Model for Part-of-Speech Tagging

Adwait Ratnaparkhi, 1996 Mawulolo Ameko and Sonia Baee CS 6501-004 - Text Mining Paper Presentation April 12th, 2018

CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 1

slide-2
SLIDE 2

Mawulolo Ameko and Sonia Baee Introduction The probability model Features for POS tagging Testing the Model Error Analysis Comparison with previous work Conclusion Question

Table of Contents

1 Introduction 2 The probability model 3 Features for POS tagging 4 Testing the Model 5 Error Analysis 6 Comparison with previous work 7 Conclusion 8 Question

CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 2

slide-3
SLIDE 3

Mawulolo Ameko and Sonia Baee Introduction The probability model Features for POS tagging Testing the Model Error Analysis Comparison with previous work Conclusion Question

Table of Contents

1 Introduction 2 The probability model 3 Features for POS tagging 4 Testing the Model 5 Error Analysis 6 Comparison with previous work 7 Conclusion 8 Question

CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 3

slide-4
SLIDE 4

Mawulolo Ameko and Sonia Baee Introduction The probability model Features for POS tagging Testing the Model Error Analysis Comparison with previous work Conclusion Question

Background

Many Natural Language tasks require the accurate assignment of Part-Of-Speech (POS) to previously unseen texts. Previous use cases for Maximum Entropy (MaxEnt) models include: . Language modeling (Lau et al., 1993) . Machine translation (Berger et al., 1996) . Prepositional phrase attachment (Ratnaparkhi et al., 1995) . Word morphology (Della Pietra et al., 1995)

CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 4

slide-5
SLIDE 5

Mawulolo Ameko and Sonia Baee Introduction The probability model Features for POS tagging Testing the Model Error Analysis Comparison with previous work Conclusion Question

Table of Contents

1 Introduction 2 The probability model 3 Features for POS tagging 4 Testing the Model 5 Error Analysis 6 Comparison with previous work 7 Conclusion 8 Question

CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 5

slide-6
SLIDE 6

Mawulolo Ameko and Sonia Baee Introduction The probability model Features for POS tagging Testing the Model Error Analysis Comparison with previous work Conclusion Question

Model Formulation

Given a set of histories H and tag contexts T the probability model is defined

  • ver the Cartesian product space H × T as:

Probability Model

p(h, t) = πµ

k

  • j=1

αfj(h,t)

j

Where π is a normalization constant, {µ, α1, · · · , αk} positive model parameters and {f1, · · · , fk} features; where fj(h, t) ∈ {0, 1}

Likelihood Function

L(p) =

n

  • i=1

p(hi, ti) =

n

  • i=1

πµ

k

  • j=1

αfj(hi,ti)

j

CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 6

slide-7
SLIDE 7

Mawulolo Ameko and Sonia Baee Introduction The probability model Features for POS tagging Testing the Model Error Analysis Comparison with previous work Conclusion Question

Equivalent Formulation

Maximum Entropy Formalism

H(p) = −

  • h∈H,t∈T

p(h, t)logp(h, t) s.t. Efj = ˜ Efj, 1 ≤ j ≤ k Where Efj and ˜ Efj represent the model’s feature expectation and the observed expectation from the training data, respectively. Generalized Iterative Scaling (Darroch and Ratcliff, 1972) used to determine the unique combination of parameters that maximizes the log-likelihood.

CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 7

slide-8
SLIDE 8

Mawulolo Ameko and Sonia Baee Introduction The probability model Features for POS tagging Testing the Model Error Analysis Comparison with previous work Conclusion Question

Table of Contents

1 Introduction 2 The probability model 3 Features for POS tagging 4 Testing the Model 5 Error Analysis 6 Comparison with previous work 7 Conclusion 8 Question

CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 8

slide-9
SLIDE 9

Mawulolo Ameko and Sonia Baee Introduction The probability model Features for POS tagging Testing the Model Error Analysis Comparison with previous work Conclusion Question

Basic Definition

Feature Definition

For a given word and tag context available in a history: hi = {wi, wi+1, wi+2, wi−1, wi−2, ti−1, ti−2} fj(hi, ti) =

  • 1

if suffix(wi) = ”ing” & ti = VBG

  • therwise

The joint distribution of a history h and tag t is determined by the activated parameters as enabled from the feature definition.

CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 9

slide-10
SLIDE 10

Mawulolo Ameko and Sonia Baee Introduction The probability model Features for POS tagging Testing the Model Error Analysis Comparison with previous work Conclusion Question

Specifically, Feature Definition

Condition Features wi is not rare wi = X & ti = T wi is rare X is prefix of wi, |X| ≤ 4 & ti = T X is suffix of wi, |X| ≤ 4 & ti = T wi contains number & ti = T wi contains uppercase character & ti = T wi contains hyphen & ti = T ∀wi ti−1 = X & ti = T ti−2ti−1 = X & ti = T wi−1 = X & ti = T wi−2 = X & ti = T wi+1 = X & ti = T wi+2 = X & ti = T

CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 10

slide-11
SLIDE 11

Mawulolo Ameko and Sonia Baee Introduction The probability model Features for POS tagging Testing the Model Error Analysis Comparison with previous work Conclusion Question

Table of Contents

1 Introduction 2 The probability model 3 Features for POS tagging 4 Testing the Model 5 Error Analysis 6 Comparison with previous work 7 Conclusion 8 Question

CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 11

slide-12
SLIDE 12

Mawulolo Ameko and Sonia Baee Introduction The probability model Features for POS tagging Testing the Model Error Analysis Comparison with previous work Conclusion Question

Testing the Model

  • Specifically, uses ”Beam Search” as search algorithm with a with beam size

N = 5

  • Uses a Tag Dictionary for seen words within the training set
  • Assigns equal probability to all tags for Unseen words
  • Test corpus is tagged one sentence at a time

CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 12

slide-13
SLIDE 13

Mawulolo Ameko and Sonia Baee Introduction The probability model Features for POS tagging Testing the Model Error Analysis Comparison with previous work Conclusion Question

Experiments

In order to conduct tagging experiments, the Wall St. Journal data has been split into three contiguous sections.

Table: WSJ Data Sizes Dataset Sentences Words Uknown words Training 40000 962687

  • Development

8000 192826 6107 Test 5485 133805 133805

CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 13

slide-14
SLIDE 14

Mawulolo Ameko and Sonia Baee Introduction The probability model Features for POS tagging Testing the Model Error Analysis Comparison with previous work Conclusion Question

Experiments-Results

Table: Baseline Performance on Development Set Total Word Accu- racy Unknown Word Ac- curacy Sentence Accuracy Tag Dictionary 96.43% 86.32% 47.55% No Tag Dictionary 96.31% 86.28% 47.38%

Error analysis reveals some ”Difficult Words”.

Table: Top Tagging Mistakes on Training Set for Baseline Model Word Correct Tag Model Tag Frequency about RB IN 393 that DT IN 389 more RBR IN 389 up IN RB 187

CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 14

slide-15
SLIDE 15

Mawulolo Ameko and Sonia Baee Introduction The probability model Features for POS tagging Testing the Model Error Analysis Comparison with previous work Conclusion Question

Table of Contents

1 Introduction 2 The probability model 3 Features for POS tagging 4 Testing the Model 5 Error Analysis 6 Comparison with previous work 7 Conclusion 8 Question

CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 15

slide-16
SLIDE 16

Mawulolo Ameko and Sonia Baee Introduction The probability model Features for POS tagging Testing the Model Error Analysis Comparison with previous work Conclusion Question

Specialized Features and Consistency

CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 16

slide-17
SLIDE 17

Mawulolo Ameko and Sonia Baee Introduction The probability model Features for POS tagging Testing the Model Error Analysis Comparison with previous work Conclusion Question

Specialized Features

Thankfully, the Maximum Entropy model allows arbitrary binary-valued features

  • n the context.

Specialized Feature Definition

For a given word and tag context available in a history: hi = {wi, wi+1, wi+2, wi−1, wi−2, ti−1, ti−2} fj(hi, ti) =

  • 1

if wi = ”about” & ti−2ti−1 = DT NNS &ti = IN

  • therwise

CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 17

slide-18
SLIDE 18

Mawulolo Ameko and Sonia Baee Introduction The probability model Features for POS tagging Testing the Model Error Analysis Comparison with previous work Conclusion Question

Specialized Features

Specialized features are constructed for ”difficult” words:

Table: Performance of Baseline Model with Specialized Features Number of ”Difficult” Words Development Set Performance 29 96.49% Table: Top Tagging Mistakes on Training Set for Baseline Model Word Baseline model error Specialized Model Errors that 246 207 up 186 169 about 110 120

CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 18

slide-19
SLIDE 19

Mawulolo Ameko and Sonia Baee Introduction The probability model Features for POS tagging Testing the Model Error Analysis Comparison with previous work Conclusion Question

Specialized Features and Consistency

CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 19

slide-20
SLIDE 20

Mawulolo Ameko and Sonia Baee Introduction The probability model Features for POS tagging Testing the Model Error Analysis Comparison with previous work Conclusion Question

Specialized Features and Consistency

Consistency test:

CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 20

slide-21
SLIDE 21

Mawulolo Ameko and Sonia Baee Introduction The probability model Features for POS tagging Testing the Model Error Analysis Comparison with previous work Conclusion Question

Specialized Features and Consistency

CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 21

slide-22
SLIDE 22

Mawulolo Ameko and Sonia Baee Introduction The probability model Features for POS tagging Testing the Model Error Analysis Comparison with previous work Conclusion Question

Specialized Features and Consistency

Performance of Baseline and specialized Model when tested on Consistent subset

  • f development set:

Table: Result Training Size(words) Test Size(words) Baseline Specialized 571190 44478 97.04% 97.13%

The marginal improvement of +.1% might imply that the features remain improvrished or that there exist an intra-annotator inconsistency.

CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 22

slide-23
SLIDE 23

Mawulolo Ameko and Sonia Baee Introduction The probability model Features for POS tagging Testing the Model Error Analysis Comparison with previous work Conclusion Question

Table of Contents

1 Introduction 2 The probability model 3 Features for POS tagging 4 Testing the Model 5 Error Analysis 6 Comparison with previous work 7 Conclusion 8 Question

CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 23

slide-24
SLIDE 24

Mawulolo Ameko and Sonia Baee Introduction The probability model Features for POS tagging Testing the Model Error Analysis Comparison with previous work Conclusion Question

Comparison with previous work

All previous models achieve an accuracy level of about 96.5% and 85% on unseen

  • words. However, the proposed model combines the merits of each model.

Table: Comparison Model Probabilistic Rich Represen- tation Independence Data Fragmen- tation SDT +1 +1 +1

  • 1

Markov Model +1 +1

  • 1

+1 TBL

  • 1

+1 +1 +1 MaxEnt +1 +1 +1 +1

CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 24

slide-25
SLIDE 25

Mawulolo Ameko and Sonia Baee Introduction The probability model Features for POS tagging Testing the Model Error Analysis Comparison with previous work Conclusion Question

Table of Contents

1 Introduction 2 The probability model 3 Features for POS tagging 4 Testing the Model 5 Error Analysis 6 Comparison with previous work 7 Conclusion 8 Question

CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 25

slide-26
SLIDE 26

Mawulolo Ameko and Sonia Baee Introduction The probability model Features for POS tagging Testing the Model Error Analysis Comparison with previous work Conclusion Question

Conclusion

  • The Maximum Entropy model is an extremely flexible technique for linguistic

modeling.

  • The implementation in this paper is a state-of-the-art POS tagger, as

evidenced by the 96.6% accuracy on the test set.

  • Note that this performance is close to current state-of-the-art performance of

97%.

CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 26

slide-27
SLIDE 27

Mawulolo Ameko and Sonia Baee Introduction The probability model Features for POS tagging Testing the Model Error Analysis Comparison with previous work Conclusion Question

Table of Contents

1 Introduction 2 The probability model 3 Features for POS tagging 4 Testing the Model 5 Error Analysis 6 Comparison with previous work 7 Conclusion 8 Question

CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 27

slide-28
SLIDE 28

Mawulolo Ameko and Sonia Baee Introduction The probability model Features for POS tagging Testing the Model Error Analysis Comparison with previous work Conclusion Question

Question

Questions?

CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 28