Feature-Based Tagging The Task, Again Recall: tagging ~ - PowerPoint PPT Presentation

Feature-Based Tagging

The Task, Again • Recall: – tagging ~ morphological disambiguation – tagset V T  (C 1 ,C 2 ,...C n ) • C i - morphological categories, such as POS, NUMBER, CASE, PERSON, TENSE, GENDER, ... – mapping w  {t  V T } exists • restriction of Morphological Analysis: A +  2 (L,C1,C2,...,Cn) where A is the language alphabet, L is the set of lemmas – extension to punctuation, sentence boundaries (treated as words) 2018/2019 UFAL MFF UK NPFL068/Intro to statistical NLP II/Jan Hajic and Pavel Pecina 59

Feature Selection Problems • Main problem with Maximum Entropy [tagging]: – Feature Selection (if number of possible features is in the hundreds of thousands or millions) – No good way • best so far: Berger & DP’s greedy algorithm • heuristics (cutoff based: ignore low-count features) • Goal: – few but “good” features (“good” ~ high predictive power ~ leading to low final cross entropy) 2018/2019 UFAL MFF UK NPFL068/Intro to statistical NLP II/Jan Hajic and Pavel Pecina 60

Feature-based Tagging • Idea: – save on computing the weights (  i ) • are they really so important? – concentrate on feature selection • Criterion (training): – error rate (~ accuracy; borrows from Brill’s tagger) • Model form (probabilistic - same as for Maximum Entropy): p(y|x) = (1/Z(x)) e  i=1..N  i f i (y,x)  Exponential (or Loglinear) Model 2018/2019 UFAL MFF UK NPFL068/Intro to statistical NLP II/Jan Hajic and Pavel Pecina 61

Feature Weight (Lambda) Approximation • Let Y be the sample space from which we predict (tags in our case), and f i (y,x) a b.v. feature • Define a “batch of features” and a “context feature”: B(x) = {f i ; all f i ’s share the same context x} f B(x) (x’) = 1  df x  x’ (x is part of x’) • in other words, holds wherever a context x is found • Example: f 1 (y,x) = 1  df y=JJ, left tag = JJ f 2 (y,x) = 1  df y=NN, left tag = JJ B(left tag = JJ) = {f 1 , f 2 } (but not, say, [y=JJ, left tag = DT]) 2018/2019 UFAL MFF UK NPFL068/Intro to statistical NLP II/Jan Hajic and Pavel Pecina 62

Estimation • Compute: p(y|B(x)) = (1/Z(B(x)))  d=1..|T|  (y d ,y)f B(x) (x d ) • frequency of y relative to all places where any of B(x) features holds for some y; Z(B(x)) is the natural normalization factor ฀ Z(B(x)) =  d=1..|T| f B(x) (x d ) “compare” to uniform distribution: ฀  (y,B(x)) = p(y|B(X)) / (1 / |Y|)  (y,B(x)) > 1 for p(y|B(x)) better than uniform; and vice versa • If f i (y,x) holds for exactly one y (in a given context x), then we have 1:1 relation between  (y,B(x)) and f i (y,x) from B(x) and  i = log (  (y,B(x))) NB: works in constant time independent of  j , j  i 2018/2019 UFAL MFF UK NPFL068/Intro to statistical NLP II/Jan Hajic and Pavel Pecina 63

What we got • Substitute: p(y|x) = (1/Z(x)) e  i=1..N  i f i (y,x) = = (1/Z(x))  i=1..N  (y,B(x)) f i (y,x) = (1/Z(x))  i=1..N (|Y| p(y|B(x))) f i (y,x) = (1/Z’(x))  i=1..N (p(y|B(x))) f i (y,x) = (1/Z’(x))  B(x’); x’  x p(y|B(x’)) ... Naive Bayes (independence assumption) 2018/2019 UFAL MFF UK NPFL068/Intro to statistical NLP II/Jan Hajic and Pavel Pecina 64

The Reality • take advantage of the exponential form of the model (do not reduce it completely to naive Bayes): – vary  (y,B(x)) up and down a bit (quickly) • captures dependence among features – recompute using “true” Maximum Entropy • the ultimate solution – combine feature batches into one, with new  (y,B(x’)) • getting very specific features 2018/2019 UFAL MFF UK NPFL068/Intro to statistical NLP II/Jan Hajic and Pavel Pecina 65

Search for Features • Essentially, a way to get rid of unimportant features: – start with a pool of features extracted from full data – remove infrequent features (small threshold, < 2) – organize the pool into batches of features • Selection from the pool P: – start with empty S (set of selected features) – try all features from the pool, compute  (y,B(x)), compute error rate over training data. – add the best feature batch permanently; stop when no correction made [complexity: |P| x |S| x |T|] 2018/2019 UFAL MFF UK NPFL068/Intro to statistical NLP II/Jan Hajic and Pavel Pecina 66

Adding Features in Blocks, Avoiding the Search for the Best • Still slow; solution: add ten (5,20) best features at a time, assuming they are independent (i.e., the next best feature would change the error rate the same way as if no intervening addition of a feature is made). • Still slow [(|P| x |S| x |T|)/10, or 5, or 20]; solution: • Add all features improving the error rate by a certain threshold; then gradually lower the threshold down to the desired value; complexity [|P| x log|S| x |T|] if threshold (n+1) = threshold (n) / k, k > 1 (e.g. k = 2) 2018/2019 UFAL MFF UK NPFL068/Intro to statistical NLP II/Jan Hajic and Pavel Pecina 67

Types of Features • Position: – current – previous, next – defined by the closest word with certain major POS • Content: – word (w), tag(t) - left only, “Ambiguity Class” (AC) of a subtag (POS, NUMBER, GENDER, CASE, ...) • Any combination of position and content • Up to three combinations of (position,content) 2018/2019 UFAL MFF UK NPFL068/Intro to statistical NLP II/Jan Hajic and Pavel Pecina 68

Ambiguity Classes (AC) • Also called “pseudowords” (MS, for word sense disambiguationi task), here: “pseudotags” • AC (for tagging) is a set of tags (used as an indivisible token). – Typically, these are the tags assigned by a morphology to a given word: • MA(books) [restricted to tags] = { NNS, VBZ }: AC = NNS_VBZ • Advantage: deterministic  looking at the ACs (and words, as before) to the right allowed 2018/2019 UFAL MFF UK NPFL068/Intro to statistical NLP II/Jan Hajic and Pavel Pecina 69

Subtags • Inflective languages: too many tags  data sparseness • Make use of separate categories (remember morphology): – tagset V T  (C 1 ,C 2 ,...C n ) • C i - morphological categories, such as POS, NUMBER, CASE, PERSON, TENSE, GENDER, ... • Predict (and use for context) the individual categories • Example feature: – previous word is a noun, and current CASE subtag is genitive • Use separate ACs for subtags, too (AC POS = N_V) 2018/2019 UFAL MFF UK NPFL068/Intro to statistical NLP II/Jan Hajic and Pavel Pecina 70

Combining Subtags • Apply the separate prediction (POS, NUMBER) to – MA(books) = { (Noun, Pl), (VerbPres, Sg)} • Now what if the best subtags are – Noun for POS – Sg for NUMBER • (Noun, Sg) is not possible for books • Allow only possible combinations (based on MA) • Use independence assumption (Tag = (C 1 , C 2 , ..., C n )): (best) Tag = argmax Tag  MA(w)  i=1..|Categories| p(C i |w,x) 2018/2019 UFAL MFF UK NPFL068/Intro to statistical NLP II/Jan Hajic and Pavel Pecina 71

Smoothing • Not needed in general (as usual for exponential models) – however, some basic smoothing has an advantage of not learning unnecessary features at the beginning – very coarse: based on ambiguity classes • assign the most probable tag for each AC, using MLE • e.g. NNS for AC = NNS_VBZ – last resort smoothing: unigram tag probability – can be even parametrized from the outside – also, needed during training 2018/2019 UFAL MFF UK NPFL068/Intro to statistical NLP II/Jan Hajic and Pavel Pecina 72

Overtraining • Does not appear in general – usual for exponential models – does appear in relation to the training curve: – but does not go down until very late in the training (singletons do cause overtraining) 2018/2019 UFAL MFF UK NPFL068/Intro to statistical NLP II/Jan Hajic and Pavel Pecina 73

Feature-Based Tagging The Task, Again Recall: tagging ~ - PowerPoint PPT Presentation

Feature-Based Tagging The Task, Again Recall: tagging ~ morphological disambiguation tagset V T (C 1 ,C 2 ,...C n ) C i - morphological categories, such as POS, NUMBER, CASE, PERSON, TENSE, GENDER, ... mapping w {t

Again & Again Again & Again Again & Again Again & Again Gods people

Again & Again Again & Again Again & Again Again & Again The Detailed

Again & Again Again & Again Again & Again Again & Again The Divine Statement:

Again & Again Again & Again Again & Again Again & Again Afuer the death of

Again & Again Again & Again Again & Again Again & Again Life, like war, is a

Again & Again Again & Again Again & Again Again & Again Now when all the

POS Tagging HMMs L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 1 / 17 POS

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 References: 1. Speech and

There and Back Again Motivation Sample Spaces and Feature Models: . . Conclusions Feature

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2005 References: 1. Speech and

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Tagging and sequence

Forewords Tagging in a nutshell Sources Slides inspired by M. Rajman and J.-C. Chappelier,

Traffic UTM Tagging AdWords WebMaster Tools UTM TAGGING Where does my traffic come from? UTM

Arabic POS Tagging Results Error Analysis Conclusion Emad Mohamed, Sandra K ubler Indiana

Part of Speech Tagging Informatics 2A: Lecture 16 John Longley School of Informatics University

= + TF TF IIC 0 . 1 (4) m The IIC can be qualified as a criterion to estimate

LEARNING [These slides were adapted from those created by Dan Klein and Pieter Abbeel for CS188

Validation and Testing COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning

Advanced branch predic.on algorithms Ryan Gabrys Ilya Kolykhmatov

Errors, and What to Do What to Do About Errors

{ output 1 if a q . y = 0 if a < q w n x n 3 1 9/27/2016 Training a classifier

Electron Reconstruction and Identification in CMS: an ECAL Perspective featuring methods for

Welcome QUESTIONS? Todays Schedule 10:15 12:00 Roll Call CEMUS Course Introductions

Feature-Based Tagging The Task, Again Recall: tagging ~ - PowerPoint PPT Presentation

Feature-Based Tagging The Task, Again Recall: tagging ~ morphological disambiguation tagset V T (C 1 ,C 2 ,...C n ) C i - morphological categories, such as POS, NUMBER, CASE, PERSON, TENSE, GENDER, ... mapping w {t

Again &amp; Again Again &amp; Again Again &amp; Again Again &amp; Again Gods people

Again &amp; Again Again &amp; Again Again &amp; Again Again &amp; Again The Detailed

Again &amp; Again Again &amp; Again Again &amp; Again Again &amp; Again The Divine Statement:

Again &amp; Again Again &amp; Again Again &amp; Again Again &amp; Again Afuer the death of

Again &amp; Again Again &amp; Again Again &amp; Again Again &amp; Again Life, like war, is a

Again &amp; Again Again &amp; Again Again &amp; Again Again &amp; Again Now when all the

POS Tagging HMMs L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 1 / 17 POS

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 References: 1. Speech and

There and Back Again Motivation Sample Spaces and Feature Models: . . Conclusions Feature

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2005 References: 1. Speech and

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Tagging and sequence

Forewords Tagging in a nutshell Sources Slides inspired by M. Rajman and J.-C. Chappelier,

Traffic UTM Tagging AdWords WebMaster Tools UTM TAGGING Where does my traffic come from? UTM

Arabic POS Tagging Results Error Analysis Conclusion Emad Mohamed, Sandra K ubler Indiana

Part of Speech Tagging Informatics 2A: Lecture 16 John Longley School of Informatics University

= + TF TF IIC 0 . 1 (4) m The IIC can be qualified as a criterion to estimate

LEARNING [These slides were adapted from those created by Dan Klein and Pieter Abbeel for CS188

Validation and Testing COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning

Advanced branch predic.on algorithms Ryan Gabrys Ilya Kolykhmatov

Errors, and What to Do What to Do About Errors

{ output 1 if a q . y = 0 if a &lt; q w n x n 3 1 9/27/2016 Training a classifier

Electron Reconstruction and Identification in CMS: an ECAL Perspective featuring methods for

Welcome QUESTIONS? Todays Schedule 10:15 12:00 Roll Call CEMUS Course Introductions

Again & Again Again & Again Again & Again Again & Again Gods people

Again & Again Again & Again Again & Again Again & Again The Detailed

Again & Again Again & Again Again & Again Again & Again The Divine Statement:

Again & Again Again & Again Again & Again Again & Again Afuer the death of

Again & Again Again & Again Again & Again Again & Again Life, like war, is a

Again & Again Again & Again Again & Again Again & Again Now when all the

{ output 1 if a q . y = 0 if a < q w n x n 3 1 9/27/2016 Training a classifier