CSEP 517 Natural Language Processing Autumn 2015 Parsing (Trees) - PowerPoint PPT Presentation

CSEP 517 Natural Language Processing Autumn 2015 Parsing (Trees) Yejin Choi - University of Washington [Slides from Dan Klein, Michael Collins, Luke Zettlemoyer and Ray Mooney]

Topics § Parse Trees § (Probabilistic) Context Free Grammars § Supervised learning § Parsing: most likely tree, marginal distributions § Treebank Parsing (English, edited text)

Parse Trees The move followed a round of similar increases by other lenders, reflecting a continuing decline in that market

Penn Treebank Non-terminals Table 1.2. The Penn Treebank syntactic tagset ADJP Adjective phrase ADVP Adverb phrase NP Noun phrase PP Prepositional phrase S Simple declarative clause SBAR Subordinate clause SBARQ Direct question introduced by wh -element SINV Declarative sentence with subject-aux inversion SQ Yes/no questions and subconstituent of SBARQ excluding wh -element VP Verb phrase WHADVP Wh-adverb phrase WHNP Wh-noun phrase WHPP Wh-prepositional phrase X Constituent of unknown or uncertain category “Understood” subject of infinitive or imperative 0 Zero variant of that in subordinate clauses T Trace of wh-Constituent

The Penn Treebank: Size I Penn WSJ Treebank = 50,000 sentences with associated trees I Usual set-up: 40,000 training sentences, 2400 test sentences An example tree: TOP S NP VP NNP NNPS VBD NP PP NP PP ADVP IN NP CD NN IN NP RB NP PP QP PRP$ JJ NN CC JJ NN NNS IN NP $ CD CD PUNC, NP SBAR NNP PUNC, WHADVP S WRB NP VP DT NN VBZ NP QP NNS PUNC. RB CD Canadian Utilities had 1988 revenue of C$ 1.16 billion , mainly from its natural gas and electric utility businesses in Alberta , where the company serves about 800,000 customers .

Phrase Structure Parsing § Phrase structure parsing organizes syntax into constituents or brackets § In general, this involves nested trees S VP § Linguists can, and do, argue about details NP PP NP N ’ NP § Lots of ambiguity new art critics write reviews with computers § Not the only kind of syntax…

Constituency Tests § How do we know what nodes go in the tree? § Classic constituency tests: § Substitution by proform § he, she, it, they, ... § Question / answer § Deletion § Movement / dislocation § Conjunction / coordination § Cross-linguistic arguments, too

Conflicting Tests § Constituency isn ’ t always clear § Units of transfer: § think about ~ penser à § talk about ~ hablar de § Phonological reduction: § I will go → I ’ ll go § I want to go → I wanna go § a le centre → au centre La vélocité des ondes sismiques § Coordination § He went to and came from the store.

Non-Local Phenomena § Dislocation / gapping § Which book should Peter buy? § A debate arose which continued until the election. § Binding § Reference § The IRS audits itself § Control § I want to go § I want you to go

Classical NLP: Parsing § Write symbolic or logical rules: Grammar (CFG) Lexicon ROOT → S NP → NP PP NN → interest S → NP VP VP → VBP NP NNS → raises NP → DT NN VP → VBP NP PP VBP → interest NP → NN NNS PP → IN NP VBZ → raises … § Use deduction systems to prove parses from words § Minimal grammar on “ Fed raises ” sentence: 36 parses § Simple 10-rule grammar: 592 parses § Real-size grammar: many millions of parses § This scaled very badly, didn ’ t yield broad-coverage tools

Attachment Ambiguity § I cleaned the dishes from dinner § I cleaned the dishes with detergent § I cleaned the dishes in my pajamas § I cleaned the dishes in the sink

I shot an elephant in my pajamas Examples from J&M

Syntactic Ambiguities I § Prepositional phrases: They cooked the beans in the pot on the stove with handles. § Particle vs. preposition: The puppy tore up the staircase. § Complement structures The tourists objected to the guide that they couldn ’ t hear. She knows you like the back of her hand. § Gerund vs. participial adjective Visiting relatives can be boring. Changing schedules frequently confused passengers.

Syntactic Ambiguities II § Modifier scope within NPs impractical design requirements plastic cup holder § Multiple gap constructions The chicken is ready to eat. The contractors are rich enough to sue. § Coordination scope: Small rats and mice can squeeze into holes or cracks in the wall.

Dark Ambiguities § Dark ambiguities: most analyses are shockingly bad (meaning, they don ’ t have an interpretation you can get your mind around) This analysis corresponds to the correct parse of “ This will panic buyers ! ” § Unknown words and new usages § Solution: We need mechanisms to focus attention on the best ones, probabilistic techniques do this

Context-Free Grammars § A context-free grammar is a tuple <N, Σ , S, R> § N : the set of non-terminals § Phrasal categories: S, NP, VP, ADJP, etc. § Parts-of-speech (pre-terminals): NN, JJ, DT, VB § Σ : the set of terminals (the words) § S : the start symbol § Often written as ROOT or TOP § Not usually the sentence non-terminal S § R : the set of rules § Of the form X → Y 1 Y 2 … Y n , with X ∈ N, n ≥ 0, Y i ∈ (N ∪ Σ ) § Examples: S → NP VP, VP → VP CC VP § Also called rewrites, productions, or local trees

Example Grammar N = { S, NP, VP, PP, DT, Vi, Vt, NN, IN } S = S Σ = { sleeps, saw, man, woman, telescope, the, with, in } Vi sleeps ⇒ R = S NP VP ⇒ Vt saw ⇒ VP Vi ⇒ NN man ⇒ VP Vt NP ⇒ NN woman ⇒ VP VP PP ⇒ NN telescope ⇒ NP DT NN ⇒ DT the ⇒ NP NP PP ⇒ IN with ⇒ PP IN NP ⇒ IN in ⇒ S=sentence, VP-verb phrase, NP=noun phrase, PP=prepositional phrase, DT=determiner, Vi=intransitive verb, Vt=transitive verb, NN=noun, IN=preposition

Example Parses R = S NP VP ⇒ VP Vi ⇒ VP Vt NP S ⇒ VP VP PP ⇒ NP VP NP DT NN ⇒ NP NP PP DT NN Vi ⇒ PP IN NP The man sleeps scope, the, with, in ⇒ Vi sleeps ⇒ S Vt saw ⇒ NN man VP ⇒ NN woman ⇒ VP NN telescope PP ⇒ DT the ⇒ NP Vt NP IN NP IN with ⇒ IN in DT NN DT NN DT NN ⇒ The man saw the woman with the telescope S=sentence, VP-verb phrase, NP=noun phrase, PP=prepositional phrase, DT=determiner, Vi=intransitive verb, Vt=transitive verb, NN=noun, IN=preposition

Probabilistic Context-Free Grammars § A context-free grammar is a tuple <N, Σ ,S, R> § N : the set of non-terminals § Phrasal categories: S, NP, VP, ADJP, etc. § Parts-of-speech (pre-terminals): NN, JJ, DT, VB, etc. § Σ : the set of terminals (the words) § S : the start symbol § Often written as ROOT or TOP § Not usually the sentence non-terminal S § R : the set of rules § Of the form X → Y 1 Y 2 … Y n , with X ∈ N, n ≥ 0, Y i ∈ (N ∪ Σ ) § Examples: S → NP VP, VP → VP CC VP § A PCFG adds a distribution q: § Probability q(r) for each r ∈ R, such that for all X ∈ N: � q ( α → β ) = 1 α → β ∈ R : α = X for any .

PCFG Example Vi sleeps 1.0 ⇒ S NP VP 1.0 ⇒ Vt saw 1.0 ⇒ VP Vi 0.4 ⇒ NN man 0.7 ⇒ VP Vt NP 0.4 ⇒ NN woman 0.2 ⇒ VP VP PP 0.2 ⇒ NN telescope 0.1 ⇒ NP DT NN 0.3 ⇒ DT the 1.0 ⇒ NP NP PP 0.7 ⇒ IN with 0.5 ⇒ PP P NP 1.0 ⇒ IN in 0.5 ⇒ • Probability of a tree t with rules α 1 → β 1 , α 2 → β 2 , . . . , α n → β n is n � p ( t ) = q ( α i → β i ) i =1 where q ( α → β ) is the probability for rule α → β . 44

PCFG Example S NP VP 1.0 ⇒ S 1.0 VP Vi 0.4 ⇒ NP VP t 1 = VP Vt NP 0.4 0.3 0.4 ⇒ DT NN Vi VP VP PP 0.2 ⇒ 1.0 0.7 1.0 NP DT NN 0.3 The man sleeps ⇒ NP NP PP 0.7 p(t 1 )=1.0*0.3*1.0*0.7*0.4*1.0 ⇒ PP P NP 1.0 S ⇒ 1.0 Vi sleeps 1.0 Probability of a tree with ru ⇒ VP 0.2 Vt saw 1.0 ⇒ t 2 = VP PP NN man 0.7 ⇒ 0.4 0.4 NN woman 0.2 ⇒ NP Vt NP IN NP 0.3 0.3 0.3 NN telescope 0.1 0.5 1.0 ⇒ DT NN DT NN DT NN DT the 1.0 ⇒ 1.0 0.2 1.0 0.7 1.0 0.1 The man saw the woman with the telescope IN with 0.5 ⇒ IN in 0.5 p(t s )=1.8*0.3*1.0*0.7*0.2*0.4*1.0*0.3*1.0*0.2*0.4*0.5*0.3*1.0*0.1 ⇒ rules

PCFGs: Learning and Inference § Model The probability of a tree t with n rules α i à β i , i = 1..n § n Y p ( t ) = q ( α i → β i ) i =1 § Learning Read the rules off of labeled sentences, use ML estimates for § probabilities q ML ( α → β ) = Count( α → β ) Count( α ) and use all of our standard smoothing tricks! § § Inference For input sentence s, define T(s) to be the set of trees whole yield is s § (whole leaves, read left to right, match the words in s) t ∗ ( s ) = arg max t ∈ T ( s ) p ( t )

Chomsky Normal Form § Chomsky normal form: § All rules of the form X → Y Z or X → w § In principle, this is no limitation on the space of (P)CFGs § N-ary rules introduce new non-terminals VP VP [VP → VBD NP PP •] [VP → VBD NP •] VBD NP PP PP VBD NP PP PP § Unaries / empties are “ promoted ” § In practice it ’ s kind of a pain: § Reconstructing n-aries is easy § Reconstructing unaries is trickier § The straightforward transformations don ’ t preserve tree scores § Makes parsing algorithms simpler!

CSEP 517 Natural Language Processing Autumn 2015 Parsing (Trees) - PowerPoint PPT Presentation

CSEP 517 Natural Language Processing Autumn 2015 Parsing (Trees) Yejin Choi - University of Washington [Slides from Dan Klein, Michael Collins, Luke Zettlemoyer and Ray Mooney] Topics Parse Trees (Probabilistic) Context Free Grammars

CSEP 517 Natural Language Processing Language Models Luke Zettlemoyer Slides adapted from Dan

CSEP 517 Natural Language Processing Introduction Luke Zettlemoyer Slides adapted from Dan

CSEP 517: Natural Language Processing New PMP Course! Instructor: Luke Zettlemoyer Autumn 2013

CSEP 517 Natural Language Processing Autumn 2018 Introduction Luke Zettlemoyer Slides adapted

Natural Language Processing (CSEP 517): Computational Pragmatics Chenhao Tan 2017 c

Natural Language Processing (CSEP 517): Introduction & Language Models Noah Smith c 2017

CSEP 517 Natural Language Processing Frame Semantics Luke Zettlemoyer Slides adapted from Yejin

CSEP 517: Natural Language Processing Recurrent Neural Networks Autumn 2018 Luke Zettlemoyer

CSEP 517 Natural Language Processing Autumn 2015 Introduction Yejin Choi Slides adapted

CSEP 517 Natural Language Processing Luke Zettlemoyer Machine Translation, Sequence-to-sequence

Natural Language Processing (CSEP 517): Distributional Semantics Roy Schwartz 2017 c

Natural Language Processing (CSEP 517): Machine Translation (Continued), Summarization, &

CSEP 517 Natural Language Processing Coreference Resolution Luke Zettlemoyer University of

Natural Language Processing (CSEP 517): Dependency Syntax and Parsing Noah Smith 2017 c

CSEP 517 Natural Language Processing Text Classification Linear Models Luke Zettlemoyer -

CSEP 517 Natural Language Processing Autumn 2018 Distributed Semantics & Embeddings Luke

Mechanical Behaviour of Materials (MSE302) Manufacturing Procesess (MSE 305 and MSE315)

First Quarter Results 2008 Zurich April 24, 2008 Cautionary statement Cautionar Cautionary

Nabarro Herring Creep Creep Resistant Materials Materials in which dislocation movement is

Webinar #RehabPsych In sight and in mind: improving mental health rehabilitation pathways The

Creep Mechanism Fractography Analysis on SnPb Eutectic Solder Joint Failure Chulmin Oh * ,

Aluminium and its alloys Alumina raw materials Alumina can be processed from bauxite, kaolinite

Natural Language Processing Other Syntactic Models Parsing IV Dan Klein UC Berkeley Dependency

Crystallographic Defects in Cellular Automata Marcus Pivato Trent University Peterborough,

CSEP 517 Natural Language Processing Autumn 2015 Parsing (Trees) - PowerPoint PPT Presentation

CSEP 517 Natural Language Processing Autumn 2015 Parsing (Trees) Yejin Choi - University of Washington [Slides from Dan Klein, Michael Collins, Luke Zettlemoyer and Ray Mooney] Topics Parse Trees (Probabilistic) Context Free Grammars

CSEP 517 Natural Language Processing Language Models Luke Zettlemoyer Slides adapted from Dan

CSEP 517 Natural Language Processing Introduction Luke Zettlemoyer Slides adapted from Dan

CSEP 517: Natural Language Processing New PMP Course! Instructor: Luke Zettlemoyer Autumn 2013

CSEP 517 Natural Language Processing Autumn 2018 Introduction Luke Zettlemoyer Slides adapted

Natural Language Processing (CSEP 517): Computational Pragmatics Chenhao Tan 2017 c

Natural Language Processing (CSEP 517): Introduction &amp; Language Models Noah Smith c 2017

CSEP 517 Natural Language Processing Frame Semantics Luke Zettlemoyer Slides adapted from Yejin

CSEP 517: Natural Language Processing Recurrent Neural Networks Autumn 2018 Luke Zettlemoyer

CSEP 517 Natural Language Processing Autumn 2015 Introduction Yejin Choi Slides adapted

CSEP 517 Natural Language Processing Luke Zettlemoyer Machine Translation, Sequence-to-sequence

Natural Language Processing (CSEP 517): Distributional Semantics Roy Schwartz 2017 c

Natural Language Processing (CSEP 517): Machine Translation (Continued), Summarization, &amp;

CSEP 517 Natural Language Processing Coreference Resolution Luke Zettlemoyer University of

Natural Language Processing (CSEP 517): Dependency Syntax and Parsing Noah Smith 2017 c

CSEP 517 Natural Language Processing Text Classification Linear Models Luke Zettlemoyer -

CSEP 517 Natural Language Processing Autumn 2018 Distributed Semantics &amp; Embeddings Luke

Mechanical Behaviour of Materials (MSE302) Manufacturing Procesess (MSE 305 and MSE315)

First Quarter Results 2008 Zurich April 24, 2008 Cautionary statement Cautionar Cautionary

Nabarro Herring Creep Creep Resistant Materials Materials in which dislocation movement is

Webinar #RehabPsych In sight and in mind: improving mental health rehabilitation pathways The

Creep Mechanism Fractography Analysis on SnPb Eutectic Solder Joint Failure Chulmin Oh * ,

Aluminium and its alloys Alumina raw materials Alumina can be processed from bauxite, kaolinite

Natural Language Processing Other Syntactic Models Parsing IV Dan Klein UC Berkeley Dependency

Crystallographic Defects in Cellular Automata Marcus Pivato Trent University Peterborough,

Natural Language Processing (CSEP 517): Introduction & Language Models Noah Smith c 2017

Natural Language Processing (CSEP 517): Machine Translation (Continued), Summarization, &

CSEP 517 Natural Language Processing Autumn 2018 Distributed Semantics & Embeddings Luke