Part-of-Speech Tagging for Twitter:
Annotation, Features, and Experiments
presented by:
Pragati Shah Sally Gao Kennan Grant
Part-of-Speech Tagging for Twitter: Annotation, Features, and - - PowerPoint PPT Presentation
Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments presented by: Pragati Shah Sally Gao Kennan Grant Overview 1. Introduction 2. Problem 3. Methodology 4. Results 5. Extensions 2 1. Introduction Primary goals
Annotation, Features, and Experiments
presented by:
Pragati Shah Sally Gao Kennan Grant
2
1. Introduction
3
Goals: ○ Enable richer text analysis of Twitter and
○ Provide case study on how to rapidly engineer core NLP system for new datasets Results: ○ ~90% accuracy on test corpus ○ Openly accessible annotated corpus and trained POS tagger
4
Twitter has 328 million monthly active users and is a fruitful source of user-generated content. However, POS tagging for Twitter is challenging.
Develop tag set and manually annotate corpus
Create additional features to incorporate into model
Conditional Random Field (CRF)
5
1,827 manually tagged tweets
Cross-validate and compare tagging accuracy against Stanford tagger
Aim: Develop intuitive tagset to maximize tagging consistency
6
Steps:
1. Design coarse tagset: {Standard tags} + {Twitter-specific tags}. 2. Tokenize with Twitter tokenizer, and tag with Stanford POS tagger. 3. Correct automatic predictions of Step 2 with manual annotation. 4. Revise tokenization and tagging guidelines. 5. Correct annotations from Step 3. 6. Calculate annotator agreement. 7. Make final sweep to correct errors.
Cohen’s Kappa (κ) ◎ Measures inter-rater reliability ◎ i.e. the agreement between two raters who each classify N items into C mutually exclusive categories
7
In paper, κ = 0.914
Final Tagging Scheme: 25 tags
◎ Standard POS tags : (Nouns, Pronouns, Verbs, Adjectives etc.) ◎ Combined POS tags: {nominal, proper noun} × {verb, possessive} ◎ Twitter/online-specific tags: (#, @, URL & email-ids, emoticons and discourse markers). ◎ Miscellaneous Category tag (G): Multiword Abbreviations, Partial words, artifacts of tokenization errors, miscellaneous symbols, possessive endings
8
9
Tag Description Example
S Nominal + possessive someone’s ^ Proper noun usa M Proper noun + verbal Mark’ll ! Interjection lol, haha, yea # Hashtag* #acl @ At-mention @BarackObama E Emoticon :-) G Other abbreviations, foreign words, possessive endings, symbols, garbage ily [I love you] ♫
*35% of hashtags were tagged with something other than #
◎ Discriminative undirected probabilistic graphical model ○ Model global dependencies
10
CRF enables the incorporation of arbitrary local features. Base features: ◎ A feature for each word type ◎ Features to check whether word contains digits or hyphens ◎ Suffix features ◎ Features looking at capitalization patterns
11
◎ TwOrth: Twitter Orthography. ○ Regex-style rules to detect @ mentions, hashtags, URLs. ◎ Names: Frequently capitalized tokens. ○ Twitter users are inconsistent in their use of capitalization. ○ Likelihood of capitalization = ◎ TagDict: Traditional tag dictionary. ○ Features for POS tags from traditional tag dictionary (PTB). ◎ DistSim: Distributional similarity. ○ Representation of term similarity via distributional features. ○ Used 1.9 million tokens from 134,000 unlabeled tweets for 10,000 most common terms. ◎ Metaph: Phonetic normalization. ○ Used the Metaphone algorithm (1999) to create coarse phonetic normalization, e.g. “lmao,” “lmaoo,” “lmaooo” map to LM.
12
training set: 1,000 tweets (14,542 tokens) development set: 327 tweets (4,770 tokens) test set: 500 tweets (7,124 tokens)
◎ Trained Stanford tagger on labeled data ◎ Tuned Gaussian prior on development data ◎ In addition to tagger with full feature set, performed feature ablation experiments (remove one set of categories one at a time)
13
Relative error reduction of 25% compared to the Stanford tagger
14
CRF Tagger with full feature set Stanford tagging accuracy Feature ablation experiments
◎ Despite the NAMES feature, the system struggles to identify proper nouns with non-standard capitalization ◎ The recall of proper nouns is only 71% ◎ The system also struggles with the miscellaneous category, G — accuracy of 26%
15
◎ Cited by 739 according to Google Scholar ◎ Owoputi et al. (2013):
○ Developed improved annotation guidelines ○ Improved annotations in the Gimpel et al. corpus ○ Twitter tagging improved from 90% to 93% accuracy (state-of-the-art results) using large-scale unsupervised word clustering and new lexical features
◎ Mohammad et al. (2013)
○ Used the Gimpel et al. POS tagger to build state-of-the-art Twitter sentiment classifier. ◎ Lamb et al. (2013) ○ Used the Gimpel et a. POS tagger to surveil the spread of flu infections on Twitter.
16
17