CSE 447/547 Natural Language Processing Winter 2018 Parsing - PowerPoint PPT Presentation

CSE 447/547 Natural Language Processing Winter 2018 Parsing (Trees) Yejin Choi - University of Washington [Slides from Dan Klein, Michael Collins, Luke Zettlemoyer and Ray Mooney]

Ambiguities

I shot [an elephant] [in my pajamas] Examples from J&M

Syntactic Ambiguities I § Prepositional phrases: They cooked the beans in the pot on the stove with handles. § Particle vs. preposition: The puppy tore up the staircase. § Complement structures The tourists objected to the guide that they couldn ’ t hear. She knows you like the back of her hand. § Gerund vs. participial adjective Visiting relatives can be boring. Changing schedules frequently confused passengers.

Syntactic Ambiguities II § Modifier scope within NPs impractical design requirements plastic cup holder § Multiple gap constructions The chicken is ready to eat. The contractors are rich enough to sue. § Coordination scope: Small rats and mice can squeeze into holes or cracks in the wall.

Dark Ambiguities § Dark ambiguities : most analyses are shockingly bad (meaning, they don ’ t have an interpretation you can get your mind around) This analysis corresponds to the correct parse of “ This will panic buyers ! ” § Unknown words and new usages § Solution: We need mechanisms to focus attention on the best ones, probabilistic techniques do this

Probabilistic Context Free Grammars

Probabilistic Context-Free Grammars § A context-free grammar is a tuple < N, Σ ,S, R > § N : the set of non-terminals § Phrasal categories: S, NP, VP, ADJP, etc. § Parts-of-speech (pre-terminals): NN, JJ, DT, VB, etc. § Σ : the set of terminals (the words) § S : the start symbol § Often written as ROOT or TOP § Not usually the sentence non-terminal S § R : the set of rules § Of the form X → Y 1 Y 2 … Y n , with X ∈ N, n ≥ 0, Y i ∈ (N ∪ Σ ) § Examples: S → NP VP, VP → VP CC VP § A PCFG adds a distribution q: § Probability q(r) for each r ∈ R, such that for all X ∈ N: � q ( α → β ) = 1 α → β ∈ R : α = X for any .

PCFG Example Vi sleeps 1.0 ⇒ S NP VP 1.0 ⇒ Vt saw 1.0 ⇒ VP Vi 0.4 ⇒ NN man 0.7 ⇒ VP Vt NP 0.4 ⇒ NN woman 0.2 ⇒ VP VP PP 0.2 ⇒ NN telescope 0.1 ⇒ NP DT NN 0.3 ⇒ DT the 1.0 ⇒ NP NP PP 0.7 ⇒ IN with 0.5 ⇒ PP P NP 1.0 ⇒ IN in 0.5 ⇒ • Probability of a tree t with rules α 1 → β 1 , α 2 → β 2 , . . . , α n → β n is n � p ( t ) = q ( α i → β i ) i =1 where q ( α → β ) is the probability for rule α → β . 44

PCFG Example S NP VP 1.0 S ⇒ 1.0 VP Vi 0.4 ⇒ NP VP t 1 = 0.3 0.4 VP Vt NP 0.4 ⇒ DT NN Vi VP VP PP 0.2 ⇒ 1.0 0.7 1.0 The man sleeps NP DT NN 0.3 ⇒ p(t 1 )=1.0*0.3*1.0*0.7*0.4*1.0 NP NP PP 0.7 ⇒ PP P NP 1.0 S ⇒ 1.0 Vi sleeps 1.0 VP Probability of a tree with ru ⇒ 0.2 Vt saw 1.0 t 2 = ⇒ VP PP NN man 0.7 ⇒ 0.4 0.4 NN woman 0.2 NP Vt NP IN NP ⇒ 0.3 0.3 0.3 0.5 1.0 NN telescope 0.1 ⇒ DT NN DT NN DT NN DT the 1.0 ⇒ 1.0 0.2 1.0 0.7 1.0 0.1 The man saw the woman with the telescope IN with 0.5 ⇒ p(t s )=1.8*0.3*1.0*0.7*0.2*0.4*1.0*0.3*1.0*0.2*0.4*0.5*0.3*1.0*0.1 IN in 0.5 ⇒ rules

PCFGs: Learning and Inference § Model The probability of a tree t with n rules α i à β i , i = 1..n § n Y p ( t ) = q ( α i → β i ) i =1 § Learning Read the rules off of labeled sentences, use ML estimates for § probabilities q ML ( α → β ) = Count( α → β ) Count( α ) and use all of our standard smoothing tricks! § § Inference For input sentence s, define T(s) to be the set of trees whole yield is s § (whole leaves, read left to right, match the words in s) t ∗ ( s ) = arg max t ∈ T ( s ) p ( t )

Dynamic Programming § We will store: score of the max parse of x i to x j with root non-terminal X π ( i, j, X ) = § So we can compute the most likely parse: = max t ∈ T G ( s ) p ( t ) π (1 , n, S ) = for all , § Via the recursion: is the s π ( i, j, X ) = max ( q ( X → Y Z ) × π ( i, s, Y ) × π ( s + 1 , j, Z )) X → Y Z ∈ R, s ∈ { i... ( j − 1) } , § With base case: The next section of this note gives justification for this recursive definition. � q ( X → x i ) if X → x i ∈ R π ( i, i, X ) = 0 otherwise natural definition: the only way that we can have a tree ro

The CKY Algorithm § Input: a sentence s = x 1 .. x n and a PCFG = <N, Σ ,S, R, q> § Initialization: For i = 1 … n and all X in N , � q ( X → x i ) if X → x i ∈ R π ( i, i, X ) = 0 otherwise § For l = 1 … (n-1) [iterate all phrase lengths] natural definition: the only way that we can have a tree ro § For i = 1 … (n-l) and j = i+l [iterate all phrases of length l] § For all X in N [iterate all non-terminals] for all , π ( i, j, X ) = max ( q ( X → Y Z ) × π ( i, s, Y ) × π ( s + 1 , j, Z )) X → Y Z ∈ R, s ∈ { i... ( j − 1) } The next section of this note gives justification for this recursive definition. § also, store back pointers bp ( i, j, X ) = arg max ( q ( X → Y Z ) × π ( i, s, Y ) × π ( s + 1 , j, Z )) X → Y Z ∈ R, s ∈ { i... ( j − 1) }

Probabilistic CKY Parser 0.8 S → NP VP 0.1 Book the flight through Houston S → X1 VP 1.0 X1 → Aux NP S :.01, S:.05*.5*.054 S: .03*.0135*.032 S → book | include | prefer Verb:.5 =.00135 =.00001296 0.01 0.004 0.006 S:. 05*.5* Nominal:.03 0.05 S → Verb NP None VP:.5*.5*.054 .000864 None 0.03 S → VP PP =.0000216 =.0135 NP → I | he | she | me NP:.6*.6* 0.1 0.02 0.02 0.06 .0024 NP:.6*.6*.15 NP → Houston | NWA None =.000864 =.054 0.16 .04 Det:.6 Det → the | a | an 0.6 0.1 0.05 Nominal: 0.6 NP → Det Nominal .5*.15*.032 None Nominal → book | flight | meal | money Nominal:.15 =.0024 0.03 0.15 0.06 0.06 0.2 Nominal → Nominal Nominal 0.5 Nominal → Nominal PP PP:1.0*.2*.16 Prep:.2 =.032 Verb → book | include | prefer 0.5 0.04 0.06 0.5 VP → Verb NP 0.3 VP → VP PP Prep → through | to | from NP:.16 0.2 0.3 0.3 1.0 PP → Prep NP

Probabilistic CKY Parser Book the flight through Houston Parse Tree S :.01, S:.05*.5*.054 Verb:.5 =.00135 S:.0000216 #1 Nominal:.03 None VP:.5*.5*.054 None =.0135 Pick most NP:.6*.6* probable .0024 NP:.6*.6*.15 None parse, i.e. take =.000864 =.054 Det:.6 max to Nominal: combine .5*.15*.032 Nominal:.15 None =.0024 probabilities of multiple PP:1.0*.2*.16 derivations Prep:.2 =.032 of each constituent in each cell. NP:.16

Probabilistic CKY Parser Parse Book the flight through Houston Tree S :.01, S:.05*.5*.054 #2 S: 00001296 Verb:.5 =.00135 S:.0000216 Nominal:.03 None VP:.5*.5*.054 None =.0135 Pick most NP:.6*.6* probable .0024 NP:.6*.6*.15 None parse, i.e. take =.000864 =.054 Det:.6 max to Nominal: combine .5*.15*.032 Nominal:.15 None =.0024 probabilities of multiple PP:1.0*.2*.16 derivations Prep:.2 =.032 of each constituent in each cell. NP:.16

Memory § How much memory does this require? § Have to store the score cache § Cache size: |symbols|*n 2 § Pruning: Beam Search § score[X][i][j] can get too large (when?) § Can keep beams (truncated maps score[i][j]) which only store the best K scores for the span [i,j] § Pruning: Coarse-to-Fine § Use a smaller grammar to rule out most X[i,j] § Much more on this later…

Time: Theory § How much time will it take to parse? § For each diff (:= j – i) (<= n) X § For each i (<= n) § For each rule X → Y Z Z Y § For each split point k Do constant work i k j § Total time: |rules|*n 3 § Something like 5 sec for an unoptimized parse of a 20-word sentences

Time: Practice § Parsing with the vanilla treebank grammar: ~ 20K Rules (not an optimized parser!) Observed exponent: 3.6 § Why ’ s it worse in practice? § Longer sentences “ unlock ” more of the grammar § All kinds of systems issues don ’ t scale

Other Dynamic Programs Can also compute other quantities: § Best Inside: score of the max parse X of w i to w j with root non-terminal X 1 n i j § Best Outside: score of the max parse of w 0 to w n with a gap from w i to w j rooted with non-terminal X X § see notes for derivation, it is a bit more complicated 1 n i j § Sum Inside/Outside: Do sums instead of maxes

Why Chomsky Normal Form? Book the flight through Houston S :.01, S:.05*.5*.054 S: .03*.0135*.032 =.00001296 Verb:.5 =.00135 S:. 05*.5* Nominal:.03 None VP:.5*.5*.054 .000864 None =.0000216 =.0135 NP:.6*.6* .0024 NP:.6*.6*.15 Inference: None =.000864 =.054 Det:.6 § Can we keep N-ary (N > 2) rules and Nominal: .5*.15*.032 still do dynamic programming? None Nominal:.15 =.0024 § Can we keep unary rules and still do PP:1.0*.2*.16 Prep:.2 dynamic programming? =.032 Learning: NP:.16 § Can we reconstruct the original trees?

Treebanks

CSE 447/547 Natural Language Processing Winter 2018 Parsing - PowerPoint PPT Presentation

CSE 447/547 Natural Language Processing Winter 2018 Parsing (Trees) Yejin Choi - University of Washington [Slides from Dan Klein, Michael Collins, Luke Zettlemoyer and Ray Mooney] Ambiguities I shot [an elephant] [in my pajamas] Examples

CSE 447/547 Natural Language Processing Winter 2020 Language Models Yejin Choi Slides adapted

CSE 447/547 Natural Language Processing Winter 2018 Dependency Parsing And Other Grammar

CSE 447/547: Natural Language Processing Deep Learning Winter 2018 Yejin Choi University of

CSE 447/547 Natural Language Processing Winter 2018 Feature Rich Models (Log Linear Models)

CSE 447 / 547 Natural Language Processing Winter 2018 Hidden Markov Models Yejin Choi

CSE 447/547 Natural Language Processing Winter 2018 Frame Semantics Yejin Choi Some slides

CSE 447 Natural Language Processing Winter 2018 Introduction Yejin Choi Slides adapted from

CSE 447 Natural Language Processing Winter 2020 Introduction Yejin Choi Slides adapted from

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Wangara wind profiles showing log-layer Atm S 547 Lecture 5, Slide 1 Roughness length vs.

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Natural Language Processing (CSE 447/547M): Introduction Noah Smith 2019 c University of

Innovating in a Post Moores Law World Mark Horowitz EE & CS, Stanford University 1 Mark

Tenterden: Jewel of the Weald Yet in the past four years almost 20 acres of green space in

t rssss Prtr

Planning Long Dynamically-Feasible Maneuvers for Autonomous Vehicles Maxim Likhachev Dave

User Needs Session 6 INST 301 Introduction to Information Science Muddiest Points Link

Natural Language Processing Parsing I Dan Klein UC Berkeley 1 2 Syntax Parse Trees The move

CANDLE FOLLOWER BRASS STAND BRASS CIB IBORIUM CREATOR MUNDI PIE IECES BRASS HOLY WATER STOUP

School of EECS Washington State University CptS 570 - Machine Learning 1 Course overview

CSE 447/547 Natural Language Processing Winter 2018 Parsing - PowerPoint PPT Presentation

CSE 447/547 Natural Language Processing Winter 2018 Parsing (Trees) Yejin Choi - University of Washington [Slides from Dan Klein, Michael Collins, Luke Zettlemoyer and Ray Mooney] Ambiguities I shot [an elephant] [in my pajamas] Examples

CSE 447/547 Natural Language Processing Winter 2020 Language Models Yejin Choi Slides adapted

CSE 447/547 Natural Language Processing Winter 2018 Dependency Parsing And Other Grammar

CSE 447/547: Natural Language Processing Deep Learning Winter 2018 Yejin Choi University of

CSE 447/547 Natural Language Processing Winter 2018 Feature Rich Models (Log Linear Models)

CSE 447 / 547 Natural Language Processing Winter 2018 Hidden Markov Models Yejin Choi

CSE 447/547 Natural Language Processing Winter 2018 Frame Semantics Yejin Choi Some slides

CSE 447 Natural Language Processing Winter 2018 Introduction Yejin Choi Slides adapted from

CSE 447 Natural Language Processing Winter 2020 Introduction Yejin Choi Slides adapted from

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Wangara wind profiles showing log-layer Atm S 547 Lecture 5, Slide 1 Roughness length vs.

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Natural Language Processing (CSE 447/547M): Introduction Noah Smith 2019 c University of

Innovating in a Post Moores Law World Mark Horowitz EE &amp; CS, Stanford University 1 Mark

Tenterden: Jewel of the Weald Yet in the past four years almost 20 acres of green space in

t rssss Prtr

Planning Long Dynamically-Feasible Maneuvers for Autonomous Vehicles Maxim Likhachev Dave

User Needs Session 6 INST 301 Introduction to Information Science Muddiest Points Link

Natural Language Processing Parsing I Dan Klein UC Berkeley 1 2 Syntax Parse Trees The move

CANDLE FOLLOWER BRASS STAND BRASS CIB IBORIUM CREATOR MUNDI PIE IECES BRASS HOLY WATER STOUP

School of EECS Washington State University CptS 570 - Machine Learning 1 Course overview

Innovating in a Post Moores Law World Mark Horowitz EE & CS, Stanford University 1 Mark