Part-of-Speech Tagging for Historical English Yi Yang and Jacob - PowerPoint PPT Presentation

Part-of-Speech Tagging for Historical English Yi Yang and Jacob Eisenstein Georgia Tech

‣ Digital humaniEes research ‣ How does the portrayal of men and women differ in Shakespeare’s plays? ‣ What’s the language use paMerns in North American slave narraEves? [Muralidharan and Hearst, 2011&2012]

‣ Digital humaniEes research ‣ How does the portrayal of men and women differ in Shakespeare’s plays? ‣ What’s the language use paMerns in North American slave narraEves? ‣ NLP can help! [Muralidharan and Hearst, 2011&2012]

‣ Digital humaniEes research ‣ How does the portrayal of men and women differ in Shakespeare’s plays? ‣ What’s the language use paMerns in North American slave narraEves? ‣ NLP can help! ‣ Only if NLP works for historical texts … [Muralidharan and Hearst, 2011&2012]

Early Modern English Hee said nobody had said anything agt mee . [Henry Oxinden, 1660]

Early Modern English He He against me Hee said nobody had said anything agt mee . ‣ Spelling variaEon [Henry Oxinden, 1660]

Stanford POS Tagger Stanford: Hee said nobody had said anything agt mee . ‣ Spelling variaEon

Stanford POS Tagger Gold: X X X Stanford: Hee said nobody had said anything agt mee . ‣ Spelling variaEon

Transfer Loss for POS Tagging 25 20 Error rate 15 10 Modern English 5 3.0 0 [Rayson et al., 2007]

Transfer Loss for POS Tagging 25 Early Modern English 20 18.0 Error rate 15 10 Modern English 5 3.0 0 [Rayson et al., 2007]

Approaches ‣ Spelling normalizaEon } Rayson et al. (2007) ‣ Map from historical spellings to Scheible et al. (2011) contemporary forms. Bollmann (2011)

Approaches ‣ Spelling normalizaEon } Rayson et al. (2007) ‣ Map from historical spellings to Scheible et al. (2011) contemporary forms. Bollmann (2011) ‣ Domain adaptaEon (this work) ‣ Build robust NLP systems with } Yang & Eisenstein (2014) representaEon learning. Yang & Eisenstein (2015)

Spelling NormalizaEon Original: Hee said nobody had said anything agt mee . Normalized: Hee said nobody had said anything aged me . [VARD; Baron and Rayson, 2008]

Spelling NormalizaEon Original: Hee said nobody had said anything agt mee . Normalized: Hee said nobody had said anything aged me . X ‣ Correct normalizaEon [VARD; Baron and Rayson, 2008]

Spelling NormalizaEon against Original: Hee said nobody had said anything agt mee . Normalized: Hee said nobody had said anything aged me . X X ‣ Correct normalizaEon ‣ Incorrect normalizaEon [VARD; Baron and Rayson, 2008]

Spelling NormalizaEon He against Original: Hee said nobody had said anything agt mee . Normalized: Hee said nobody had said anything aged me . X X X ‣ Correct normalizaEon ‣ Incorrect normalizaEon ‣ False negaEve [VARD; Baron and Rayson, 2008]

Spelling NormalizaEon Gold: Stanford: Normalized: Hee said nobody had said anything aged me . X X X [VARD; Baron and Rayson, 2008]

Spelling NormalizaEon Gold: X X Stanford: Normalized: Hee said nobody had said anything aged me . X X X [VARD; Baron and Rayson, 2008]

RepresentaEon Learning Hee said nobody had said anything agt mee .

RepresentaEon Learning Hee said nobody had said anything agt mee . OOV Context IV Context said said } } He was was I Hee came came We told told … … …

Feature Embeddings Hee said nobody had said anything agt mee . [FEMA; Yang and Eisenstein, 2015]

Feature Embeddings Hee said nobody had said anything agt mee . CurrWord = hee } 1 NextWord = said 2 Prefix1 = h features 3 Suffix1 = e 4 … [FEMA; Yang and Eisenstein, 2015]

Feature Embeddings > v t � � p ( f t | f 2 ) ∝ exp u 2 Input Output embeddings embeddings CurrWord = hee v 1 } 1 NextWord = said u 2 2 Prefix1 = h features v 3 3 Suffix1 = e v 4 4 … [FEMA; Yang and Eisenstein, 2015]

Feature Embeddings > v t � � p ( f t | f 2 ) ∝ exp u 2 T X ` = log p ( f t | f 2 ) Input Output t 6 =2 embeddings embeddings CurrWord = hee v 1 } 1 NextWord = said u 2 2 Prefix1 = h features v 3 3 Suffix1 = e v 4 4 … [FEMA; Yang and Eisenstein, 2015]

Word Embeddings hee } ‣ Word embeddings 1 said 2 nobody words 3 had 4 … CurrWord = hee ‣ Feature embeddings } 1 NextWord = said 2 Prefix1 = h features 3 Suffix1 = e 4 … [word2vec; Mikolov et al., 2013]

Word Embeddings hee } ‣ Word embeddings 1 said 2 ‣ Generic representaEons nobody words 3 had 4 … CurrWord = hee ‣ Feature embeddings } 1 NextWord = said 2 Prefix1 = h features 3 Suffix1 = e 4 … [word2vec; Mikolov et al., 2013]

Word Embeddings hee } ‣ Word embeddings 1 said 2 ‣ Generic representaEons nobody words 3 had 4 … CurrWord = hee ‣ Feature embeddings } 1 NextWord = said ‣ Task-specific representaEons 2 Prefix1 = h features 3 Suffix1 = e 4 … [word2vec; Mikolov et al., 2013]

Word Embeddings hee } ‣ Word embeddings 1 said 2 ‣ Generic representaEons nobody words 3 ‣ Word co-occurrences had 4 … CurrWord = hee ‣ Feature embeddings } 1 NextWord = said ‣ Task-specific representaEons 2 Prefix1 = h features 3 Suffix1 = e 4 … [word2vec; Mikolov et al., 2013]

Word Embeddings hee } ‣ Word embeddings 1 said 2 ‣ Generic representaEons nobody words 3 ‣ Word co-occurrences had 4 … CurrWord = hee ‣ Feature embeddings } 1 NextWord = said ‣ Task-specific representaEons 2 Prefix1 = h features 3 ‣ Feature co-occurrences Suffix1 = e 4 … [word2vec; Mikolov et al., 2013]

Learning from MulEple Domains ‣ Previous work on unsupervised domain adaptaEon involves in two domains. [FEMA; Yang and Eisenstein, 2015]

Learning from MulEple Domains ‣ Previous work on unsupervised domain adaptaEon involves in two domains. ‣ Unsupervised mulE-domain adaptaEon [FEMA; Yang and Eisenstein, 2015]

MulEple Feature Embeddings Hee said nobody had said anything agt mee . [FEMA; Yang and Eisenstein, 2015]

MulEple Feature Embeddings Domain AMributes: Genre Epoch Hee said nobody had said anything agt mee . [FEMA; Yang and Eisenstein, 2015]

MulEple Feature Embeddings Domain AMributes: Genre Epoch leMers 1600+ Hee said nobody had said anything agt mee . [FEMA; Yang and Eisenstein, 2015]

MulEple Feature Embeddings Domain AMributes: Genre Epoch leMers 1600+ Hee said nobody had said anything agt mee . CurrWord = hee } 1 NextWord = said 2 Prefix1 = h features 3 Suffix1 = e 4 … [FEMA; Yang and Eisenstein, 2015]

MulEple Feature Embeddings Domain AMributes: Genre Epoch leMers 1600+ Hee said nobody had said anything agt mee . CurrWord = hee } 1 NextWord = said = + + 2 (shared) (leMers) (1600+) Prefix1 = h features 3 Suffix1 = e 4 … [FEMA; Yang and Eisenstein, 2015]

MulEple Feature Embeddings Hee said nobody had said anything agt mee . CurrWord = hee } 1 NextWord = said = + + 2 (shared) (leMers) (1600+) Prefix1 = h features 3 Suffix1 = e 4 … [FEMA; Yang and Eisenstein, 2015]

MulEple Feature Embeddings = + + u 2 = h (shared) + h (letters) + h (1600+) 2 2 2 Hee said nobody had said anything agt mee . CurrWord = hee } 1 NextWord = said = + + 2 (shared) (leMers) (1600+) Prefix1 = h features 3 Suffix1 = e 4 … [FEMA; Yang and Eisenstein, 2015]

MulEple Feature Embeddings > v t � � p ( f t | f 2 ) ∝ exp u 2 u 2 = h (shared) + h (letters) + h (1600+) 2 2 2 Hee said nobody had said anything agt mee . CurrWord = hee } 1 NextWord = said = + + 2 (shared) (leMers) (1600+) Prefix1 = h features 3 Suffix1 = e 4 … [FEMA; Yang and Eisenstein, 2015]

Experiments

Part-of-Speech Tagging for Historical English Yi Yang and Jacob - PowerPoint PPT Presentation

Part-of-Speech Tagging for Historical English Yi Yang and Jacob Eisenstein Georgia Tech Digital humaniEes research How does the portrayal of men and women differ in Shakespeares plays? Whats the language use paMerns in North

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 References: 1. Speech and

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2005 References: 1. Speech and

Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics

POS Tagging HMMs L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 1 / 17 POS

Part of Speech Tagging Informatics 2A: Lecture 16 John Longley School of Informatics University

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

4 English I CP or Honors Credits English II CP or Honors of English III CP or

The Tagging Task Part-of-Speech Tagging Input: the lead paint is unsafe Output: the/Det lead/N

Natural Language Processing Parts of Speech Part of Speech Tagging Dan Klein UC

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models Graham Neubig

Part Of Speech (POS) Tagging Based on Foundations of Statistical NLP by C. Manning & H.

Syntactic Processing: Parts-of-Speech Tagging CSE354 - Spring 2020 Task Syntactic

Natural Language Processing Part of Speech Tagging Dan Klein UC Berkeley 1 2 Parts of

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules Speech

Part-of-speech Tagging for Middle English through Alignment and Projection of Parallel Diachronic

Part I Introduction General Information Computer programming is an art form, like the creation

CS527 Software Security Program Testing Mathias Payer Purdue University, Spring 2018 Mathias

Everware - lowering reproducibility barriers Andrey Ustyuzhanin Yandex School of Data Analysis

Week 5: Manipulate, Facet, Reduce Encode Manipulate Facet Encode Manipulate Facet

Ecocriticism Trends in Scholarship (Cf. environment over its observer (bestiaries) physical

Zara Bagdasarian for the ANKE Collaboration z.bagdasarian@fz-juelich.de Forschungszentrum Jlich

Engaging Stakeholders on Impacts of Climate on Water Resources Nicole Herman-Mercer Integrated

Studies on decays of light mesons 9th Workshop on Hadron Physics in China and Opportunities

Part-of-Speech Tagging for Historical English Yi Yang and Jacob - PowerPoint PPT Presentation

Part-of-Speech Tagging for Historical English Yi Yang and Jacob Eisenstein Georgia Tech Digital humaniEes research How does the portrayal of men and women differ in Shakespeares plays? Whats the language use paMerns in North

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 References: 1. Speech and

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2005 References: 1. Speech and

Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics

POS Tagging HMMs L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 1 / 17 POS

Part of Speech Tagging Informatics 2A: Lecture 16 John Longley School of Informatics University

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

4 English I CP or Honors Credits English II CP or Honors of English III CP or

The Tagging Task Part-of-Speech Tagging Input: the lead paint is unsafe Output: the/Det lead/N

Natural Language Processing Parts of Speech Part of Speech Tagging Dan Klein UC

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models Graham Neubig

Part Of Speech (POS) Tagging Based on Foundations of Statistical NLP by C. Manning &amp; H.

Syntactic Processing: Parts-of-Speech Tagging CSE354 - Spring 2020 Task Syntactic

Natural Language Processing Part of Speech Tagging Dan Klein UC Berkeley 1 2 Parts of

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules Speech

Part-of-speech Tagging for Middle English through Alignment and Projection of Parallel Diachronic

Part I Introduction General Information Computer programming is an art form, like the creation

CS527 Software Security Program Testing Mathias Payer Purdue University, Spring 2018 Mathias

Everware - lowering reproducibility barriers Andrey Ustyuzhanin Yandex School of Data Analysis

Week 5: Manipulate, Facet, Reduce Encode Manipulate Facet Encode Manipulate Facet

Ecocriticism Trends in Scholarship (Cf. environment over its observer (bestiaries) physical

Zara Bagdasarian for the ANKE Collaboration z.bagdasarian@fz-juelich.de Forschungszentrum Jlich

Engaging Stakeholders on Impacts of Climate on Water Resources Nicole Herman-Mercer Integrated

Studies on decays of light mesons 9th Workshop on Hadron Physics in China and Opportunities

Part Of Speech (POS) Tagging Based on Foundations of Statistical NLP by C. Manning & H.