Orientation Lectures 5-6 ANLP Lecture 8 Task: Language modelling - PowerPoint PPT Presentation

Orientation Lectures 5-6 ANLP Lecture 8 Task: Language modelling Part-of-speech tagging Model: Sequence model, all variables directly observed Sharon Goldwater Lecture 7 (based on slides by Philipp Koehn) Task: Text classification 1 October 2019 Model: Bag-of-words model, Includes hidden variables (categories of documents) Sharon Goldwater ANLP Lecture 8 1 October 2019 Orientation Today’s lecture Lectures 5-6 • What are parts of speech and POS tagging? Task: Language modelling Model: Sequence model, • What linguistic information should we consider? all variables directly observed Lecture 7 • What are some different tagsets and cross-linguistic issues? Task: Text classification • What is a Hidden Markov Model? Model: Bag-of-words model, Includes hidden variables • (Next time: what algorithms do we need for HMMs?) (categories of documents) Lectures 8-9 Task: Part-of-speech tagging Model: Sequence model, Includes hidden variables Sharon Goldwater ANLP Lecture 8 3 (categories of words in sequence)

What is part of speech tagging? Other tagging tasks Other problems can also be framed as tagging (sequence labelling): • Given a string: • Case restoration: If we just get lowercased text, we may want This is a simple sentence to restore proper casing, e.g. the river Thames • Identify parts of speech (syntactic categories): • Named entity recognition: it may also be useful to find names of persons, organizations, etc. in the text, e.g. Barack Obama This/DET is/VERB a/DET simple/ADJ sentence/NOUN • Information field segmentation: Given specific type of text • First step towards syntactic analysis (classified advert, bibiography entry), identify which words belong to which “fields” (price/size/#bedrooms, author/title/year) • Illustrates use of hidden Markov models to label sequences • Prosodic marking: In speech synthesis, which words/syllables have stress/intonation changes, e.g. He’s going. vs He’s going? Sharon Goldwater ANLP Lecture 8 4 Sharon Goldwater ANLP Lecture 8 5 Parts of Speech How many parts of speech? • Open class words (or content words) • Both linguistic and practical considerations – nouns, verbs, adjectives, adverbs • Corpus annotators decide. Distinguish between – mostly content-bearing: they refer to objects, actions, and – proper nouns (names) and common nouns? features in the world – open class, since there is no limit to what these words are, new – singular and plural nouns? ones are added all the time ( email, website ). – past and present tense verbs? – auxiliary and main verbs? • Closed class words (or function words) – etc – pronouns, determiners, prepositions, connectives, ... – there is a limited number of these – mostly functional: to tie the concepts of a sentence together Sharon Goldwater ANLP Lecture 8 6 Sharon Goldwater ANLP Lecture 8 7

English POS tag sets Usually have 40-100 tags. For example, • Brown corpus (87 tags) – One of the earliest large corpora collected for computational linguistics (1960s) – A balanced corpus: different genres (fiction, news, academic, editorial, etc) • Penn Treebank corpus (45 tags) – First large corpus annotated with POS and full syntactic trees (1992) – Possibly the most-used corpus in NLP – Originally, just text from the Wall Street Journal (WSJ) Sharon Goldwater ANLP Lecture 8 8 J&M Fig 5.6: Penn Treebank POS tags POS tags in other languages Universal POS tags (Petrov et al., 2011) • Morphologically rich languages often have compound • A move in the other direction morphosyntactic tags • Simplify the set of tags to lowest common denominator across (J&M, p.196) languages Noun+A3sg+P2sg+Nom • Hundreds or thousands of possible combinations • Map existing annotations onto universal tags { VB, VBD, VBG, VBN, VBP, VBZ, MD } ⇒ VERB • Predicting these requires more complex methods than what we will discuss (e.g., may combine an FST with a probabilistic • Allows interoperability of systems across languages disambiguation system) • Promoted by Google and others Sharon Goldwater ANLP Lecture 8 10 Sharon Goldwater ANLP Lecture 8 11

Universal POS tags (Petrov et al., 2011) Why is POS tagging hard? NOUN (nouns) The usual reasons! VERB (verbs) • Ambiguity: ADJ (adjectives) glass of water/NOUN vs. water/VERB the plants ADV (adverbs) lie/VERB down vs. tell a lie/NOUN PRON (pronouns) wind/VERB down vs. a mighty wind/NOUN (homographs) DET (determiners and articles) ADP (prepositions and postpositions) How about time flies like an arrow ? NUM (numerals) CONJ (conjunctions) • Sparse data: PRT (particles) – Words we haven’t seen before (at all, or in this context) ’.’ (punctuation marks) – Word-Tag pairs we haven’t seen before X (anything else, such as abbreviations or foreign words) Sharon Goldwater ANLP Lecture 8 12 Sharon Goldwater ANLP Lecture 8 13 Relevant knowledge for POS tagging A probabilistic model for tagging Let’s define a new generative process for sentences. • The word itself • To generate sentence of length n : – Some words may only be nouns, e.g. arrow – Some words are ambiguous, e.g. like, flies Let t 0 = <s> – Probabilities may help, if one tag is more likely than another For i = 1 to n Choose a tag conditioned on previous tag: P ( t i | t i − 1 ) • Local context Choose a word conditioned on its tag: P ( w i | t i ) – two determiners rarely follow each other – two base form verbs rarely follow each other • So, model assumes: – determiner is almost always followed by adjective or noun – Each tag depends only on previous tag: a bigram model over tags. – Words are conditionally independent given tags Sharon Goldwater ANLP Lecture 8 14 Sharon Goldwater ANLP Lecture 8 15

Generative process example Probabilistic finite-state machine • Arrows indicate probabilistic dependencies: • One way to view the model: sentences are generated by walking through states in a graph. Each state represents a tag. </s> <s> DT NN VBD DT NNS VBG START VB NN IN a cat saw the rats jumping DET END • Prob of moving from state s to s ′ ( transition probability ): P ( t i = s ′ | t i − 1 = s ) Sharon Goldwater ANLP Lecture 8 17 Probabilistic finite-state machine What can we do with this model? • When passing through a state, emit a word. • Simplest thing: if we know the parameters (tag transition and word emission probabilities), can compute the probability of a like tagged sentence. flies • Let S = w 1 . . . w n be the sentence and T = t 1 . . . t n be the VB corresponding tag sequence. Then n � p ( S, T ) = P ( t i | t i − 1 ) P ( w i | t i ) • Prob of emitting w from state s ( emission probability ): i =1 P ( w i = w | t i = s ) Sharon Goldwater ANLP Lecture 8 18 Sharon Goldwater ANLP Lecture 8 19

Example: computing joint prob. P ( S, T ) Example: computing joint prob. P ( S, T ) What’s the probability of this tagged sentence? What’s the probability of this tagged sentence? This/DT is/VB a/DT simple/JJ sentence/NN This/DT is/VB a/DT simple/JJ sentence/NN • First, add begin- and end-of-sentence <s> and </s> . Then: n � p ( S, T ) = P ( t i | t i − 1 ) P ( w i | t i ) i =1 = P ( DT | <s> ) P ( VB | DT ) P ( DT | VB ) P ( JJ | DT ) P ( NN | JJ ) P ( </s> | NN ) · P ( This | DT ) P ( is | VB ) P ( a | DT ) P ( simple | JJ ) P ( sentence | NN ) • But now we need to plug in probabilities... from where? Sharon Goldwater ANLP Lecture 8 20 Sharon Goldwater ANLP Lecture 8 21 Training the model Training the model Given a corpus annotated with tags (e.g., Penn Treebank), Given a corpus annotated with tags (e.g., Penn Treebank), we estimate P ( w i | t i ) and P ( t i | t i − 1 ) using familiar methods we estimate P ( w i | t i ) and P ( t i | t i − 1 ) using familiar methods (MLE/smoothing) (MLE/smoothing) (Fig from J&M draft 3rd edition) Sharon Goldwater ANLP Lecture 8 22 Sharon Goldwater ANLP Lecture 8 23

Training the model But... tagging? Given a corpus annotated with tags (e.g., Penn Treebank), Normally, we want to use the model to find the best tag sequence we estimate P ( w i | t i ) and P ( t i | t i − 1 ) using familiar methods for an untagged sentence. (MLE/smoothing) • Thus, the name of the model: hidden Markov model – Markov : because of Markov assumption (tag/state only depends on immediately previous tag/state). – hidden : because we only observe the words/emissions; the tags/states are hidden (or latent ) variables. • FSM view: given a sequence of words, what is the most probable state path that generated them? (Fig from J&M draft 3rd edition) Sharon Goldwater ANLP Lecture 8 24 Sharon Goldwater ANLP Lecture 8 25 Hidden Markov Model (HMM) Formalizing the tagging problem HMM is actually a very general model for sequences. Elements of Normally, we want to use the model to find the best tag sequence an HMM: T for an untagged sentence S : • a set of states (here: the tags) argmax T p ( T | S ) • an output alphabet (here: words) • intitial state (here: beginning of sentence) • state transition probabilities (here: p ( t i | t i − 1 ) ) • symbol emission probabilities (here: p ( w i | t i ) ) Sharon Goldwater ANLP Lecture 8 26 Sharon Goldwater ANLP Lecture 8 27

Orientation Lectures 5-6 ANLP Lecture 8 Task: Language modelling - PowerPoint PPT Presentation

Orientation Lectures 5-6 ANLP Lecture 8 Task: Language modelling Part-of-speech tagging Model: Sequence model, all variables directly observed Sharon Goldwater Lecture 7 (based on slides by Philipp Koehn) Task: Text classification 1

ORIENTATION TO ORIENTATION TO ORIENTATION TO ORIENTATION TO ELECTRONIC VISIT VERIFICATION

ORIENTATION TO ORIENTATION TO ORIENTATION TO ORIENTATION TO ELECTRONIC VISIT VERIFICATION

ORIENTATION Transferring to Loyola ORIENTATION

Welcome to Orientation Employee Orientation Handbook The Carewest Employee Orientation Handbook

Object orientation Object orientation is imperative programming with some additions DD2471

Orientation Overview This orientation w ill give you the opportunity to learn more about Animal

Orientation Representation Jim Van Verth NVIDIA Corporation (jim@essentialmath.com) Topics

P1 Orientation Briefing Orientation package consists of Orientation booklet Traffic police

IPP Finishing 2.1: Orientation and Feed Orientation Extensions Smith Kennedy April 28, 2016 1

Script to follow the Orientation Presentation January 23, 2018 Finastra | January 23, 201823

EVENT CENTER EAST-WEST ORIENTATION PREFERRED EAST-WEST ORIENTATION PREFERRED EAST-WEST

Honors Parent Orientation 2020 HONORS PARENT ORIENTATION 2020 HONORS PROGRAM OVERVIEW Our Vision

Orientation & Quaternions CSE169: Computer Animation Instructor: Steve Rotenberg UCSD,

Modeling Orientation LISSOM Orientation Maps Starting point: LISSOM retinotopy model

Orientation Workshop Sem 2, 2017 Helping you transition successfully to University 1. Orientation

Orientation & Quaternions CSE169: Computer Animation Instructor: Steve Rotenberg UCSD,

Degree-constrained orientations of embedded graphs Yann Disser Jannik Matuschke The

Advanced topics in software systems Reid Holmes Winter 2010 CSEP504 Lecture 6 CSEP 504:

Dark, Beyond Deep --- Rethink About Computer Vision Song-Chun Zhu 1 Distribution Statement

Critical Issue Discussion The University Task Force Board of Curators Meeting April 13, 2018

Parallelization of an Image Retrieval Algorithm Zhenman Fang , Donglei Yang, Weihua Zhang, Haibo

DNS/DNSSEC/DANE/DNS-over- TLS etc. Team IETF95 Hackathon In-Person: Ray Bellis, Sebastian

Incident Management Team COVID-19 Incident Briefing Thursday, July 30, 2020 Bill Bullock Carbon

Belbin Team Roles Semester 1 2004 University of Edinburgh Management School 1 Full Time

Orientation Lectures 5-6 ANLP Lecture 8 Task: Language modelling - PowerPoint PPT Presentation

Orientation Lectures 5-6 ANLP Lecture 8 Task: Language modelling Part-of-speech tagging Model: Sequence model, all variables directly observed Sharon Goldwater Lecture 7 (based on slides by Philipp Koehn) Task: Text classification 1

ORIENTATION TO ORIENTATION TO ORIENTATION TO ORIENTATION TO ELECTRONIC VISIT VERIFICATION

ORIENTATION TO ORIENTATION TO ORIENTATION TO ORIENTATION TO ELECTRONIC VISIT VERIFICATION

ORIENTATION Transferring to Loyola ORIENTATION

Welcome to Orientation Employee Orientation Handbook The Carewest Employee Orientation Handbook

Object orientation Object orientation is imperative programming with some additions DD2471

Orientation Overview This orientation w ill give you the opportunity to learn more about Animal

Orientation Representation Jim Van Verth NVIDIA Corporation (jim@essentialmath.com) Topics

P1 Orientation Briefing Orientation package consists of Orientation booklet Traffic police

IPP Finishing 2.1: Orientation and Feed Orientation Extensions Smith Kennedy April 28, 2016 1

Script to follow the Orientation Presentation January 23, 2018 Finastra | January 23, 201823

EVENT CENTER EAST-WEST ORIENTATION PREFERRED EAST-WEST ORIENTATION PREFERRED EAST-WEST

Honors Parent Orientation 2020 HONORS PARENT ORIENTATION 2020 HONORS PROGRAM OVERVIEW Our Vision

Orientation &amp; Quaternions CSE169: Computer Animation Instructor: Steve Rotenberg UCSD,

Modeling Orientation LISSOM Orientation Maps Starting point: LISSOM retinotopy model

Orientation Workshop Sem 2, 2017 Helping you transition successfully to University 1. Orientation

Orientation &amp; Quaternions CSE169: Computer Animation Instructor: Steve Rotenberg UCSD,

Degree-constrained orientations of embedded graphs Yann Disser Jannik Matuschke The

Advanced topics in software systems Reid Holmes Winter 2010 CSEP504 Lecture 6 CSEP 504:

Dark, Beyond Deep --- Rethink About Computer Vision Song-Chun Zhu 1 Distribution Statement

Critical Issue Discussion The University Task Force Board of Curators Meeting April 13, 2018

Parallelization of an Image Retrieval Algorithm Zhenman Fang , Donglei Yang, Weihua Zhang, Haibo

DNS/DNSSEC/DANE/DNS-over- TLS etc. Team IETF95 Hackathon In-Person: Ray Bellis, Sebastian

Incident Management Team COVID-19 Incident Briefing Thursday, July 30, 2020 Bill Bullock Carbon

Belbin Team Roles Semester 1 2004 University of Edinburgh Management School 1 Full Time

Orientation & Quaternions CSE169: Computer Animation Instructor: Steve Rotenberg UCSD,

Orientation & Quaternions CSE169: Computer Animation Instructor: Steve Rotenberg UCSD,