Natural Language Processing (CSEP 517): Phrase Structure Syntax and - PowerPoint PPT Presentation

Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith � 2017 c University of Washington nasmith@cs.washington.edu April 24, 2017 1 / 87

To-Do List ◮ Online quiz: due Sunday ◮ Ungraded mid-quarter survey: due Sunday ◮ Read: Jurafsky and Martin (2008, ch. 12–14), Collins (2011) ◮ A3 due May 7 (Sunday) 2 / 87

Finite-State Automata A finite-state automaton (plural “automata”) consists of: ◮ A finite set of states S ◮ Initial state s 0 ∈ S ◮ Final states F ⊆ S ◮ A finite alphabet Σ ◮ Transitions δ : S × Σ → 2 S ◮ Special case: deterministic FSA defines δ : S × Σ → S A string x ∈ Σ n is recognizable by the FSA iff there is a sequence � s 0 , . . . , s n � such that s n ∈ F and n � [[ s i ∈ δ ( s i − 1 , x i )]] i =1 This is sometimes called a path . 3 / 87

Terminology from Theory of Computation ◮ A regular expression can be: ◮ an empty string (usually denoted ǫ ) or a symbol from Σ ◮ a concatentation of regular expressions (e.g., abc ) ◮ an alternation of regular expressions (e.g., ab | cd ) ◮ a Kleene star of a regular expression (e.g., (abc) ∗ ) ◮ A language is a set of strings. ◮ A regular language is a language expressible by a regular expression. ◮ Important theorem: every regular language can be recognized by a FSA, and every FSA’s language is regular. 4 / 87

Proving a Language Isn’t Regular Pumping lemma (for regular languages): if L is an infinite regular language, then there exist strings x , y , and z , with y � = ǫ , such that xy n z ∈ L , for all n ≥ 0 . z x s 0 s f s y If L is infinite and x , y , z do not exist, then L is not regular. 5 / 87

Proving a Language Isn’t Regular Pumping lemma (for regular languages): if L is an infinite regular language, then there exist strings x , y , and z , with y � = ǫ , such that xy n z ∈ L , for all n ≥ 0 . z x s 0 s f s y If L is infinite and x , y , z do not exist, then L is not regular. If L 1 and L 2 are regular, then L 1 ∩ L 2 is regular. 6 / 87

Proving a Language Isn’t Regular Pumping lemma (for regular languages): if L is an infinite regular language, then there exist strings x , y , and z , with y � = ǫ , such that xy n z ∈ L , for all n ≥ 0 . z x s 0 s f s y If L is infinite and x , y , z do not exist, then L is not regular. If L 1 and L 2 are regular, then L 1 ∩ L 2 is regular. If L 1 ∩ L 2 is not regular, and L 1 is regular, then L 2 is not regular. 7 / 87

the cat likes tuna fish the cat the dog chased likes tuna fish the cat the dog the mouse scared chased likes tuna fish the cat the dog the mouse the elephant squashed scared chased likes tuna fish the cat the dog the mouse the elephant the flea bit squashed scared chased likes tuna fish the cat the dog the mouse the elephant the flea the virus infected bit squashed scared chased likes tuna fish 9 / 87

Linguistic Debate 10 / 87

Linguistic Debate Chomsky put forward an argument like the one we just saw. 11 / 87

Linguistic Debate Chomsky put forward an argument like the one we just saw. (Chomsky gets credit for formalizing a hierarchy of types of languages: regular, context-free, context-sensitive, recursively enumerable. This was an important contribution to CS!) 12 / 87

Linguistic Debate Chomsky put forward an argument like the one we just saw. (Chomsky gets credit for formalizing a hierarchy of types of languages: regular, context-free, context-sensitive, recursively enumerable. This was an important contribution to CS!) Some are unconvinced, because after a few center embeddings, the examples become unintelligible. 13 / 87

Linguistic Debate Chomsky put forward an argument like the one we just saw. (Chomsky gets credit for formalizing a hierarchy of types of languages: regular, context-free, context-sensitive, recursively enumerable. This was an important contribution to CS!) Some are unconvinced, because after a few center embeddings, the examples become unintelligible. Nonetheless, most agree that natural language syntax isn’t well captured by FSAs. 14 / 87

Noun Phrases What, exactly makes a noun phrase? Examples (Jurafsky and Martin, 2008): ◮ Harry the Horse ◮ the Broadway coppers ◮ they ◮ a high-class spot such as Mindy’s ◮ the reason he comes into the Hot Box ◮ three parties from Brooklyn 15 / 87

Constituents More general than noun phrases: constituents are groups of words. Linguists characterize constituents in a number of ways, including: 16 / 87

Constituents More general than noun phrases: constituents are groups of words. Linguists characterize constituents in a number of ways, including: ◮ where they occur (e.g., “NPs can occur before verbs”) ◮ where they can move in variations of a sentence ◮ On September 17th, I’d like to fly from Atlanta to Denver ◮ I’d like to fly on September 17th from Atlanta to Denver ◮ I’d like to fly from Atlanta to Denver on September 17th 17 / 87

Constituents More general than noun phrases: constituents are groups of words. Linguists characterize constituents in a number of ways, including: ◮ where they occur (e.g., “NPs can occur before verbs”) ◮ where they can move in variations of a sentence ◮ On September 17th, I’d like to fly from Atlanta to Denver ◮ I’d like to fly on September 17th from Atlanta to Denver ◮ I’d like to fly from Atlanta to Denver on September 17th ◮ what parts can move and what parts can’t ◮ *On September I’d like to fly 17th from Atlanta to Denver 18 / 87

Constituents More general than noun phrases: constituents are groups of words. Linguists characterize constituents in a number of ways, including: ◮ where they occur (e.g., “NPs can occur before verbs”) ◮ where they can move in variations of a sentence ◮ On September 17th, I’d like to fly from Atlanta to Denver ◮ I’d like to fly on September 17th from Atlanta to Denver ◮ I’d like to fly from Atlanta to Denver on September 17th ◮ what parts can move and what parts can’t ◮ *On September I’d like to fly 17th from Atlanta to Denver ◮ what they can be conjoined with ◮ I’d like to fly from Atlanta to Denver on September 17th and in the morning 19 / 87

Recursion and Constituents this is the house this is the house that Jack built this is the cat that lives in the house that Jack built this is the dog that chased the cat that lives in the house that Jack built this is the flea that bit the dog that chased the cat that lives in the house the Jack built this is the virus that infected the flea that bit the dog that chased the cat that lives in the house that Jack built 20 / 87

Not Constituents (Pullum, 1991) ◮ If on a Winter’s Night a Traveler (by Italo Calvino) ◮ Nuclear and Radiochemistry (by Gerhart Friedlander et al.) ◮ The Fire Next Time (by James Baldwin) ◮ A Tad Overweight, but Violet Eyes to Die For (by G.B. Trudeau) ◮ Sometimes a Great Notion (by Ken Kesey) ◮ [how can we know the] Dancer from the Dance (by Andrew Holleran) 21 / 87

Context-Free Grammar A context-free grammar consists of: ◮ A finite set of nonterminal symbols N ◮ A start symbol S ∈ N ◮ A finite alphabet Σ , called “terminal” symbols, distinct from N ◮ Production rule set R , each of the form “ N → α ” where ◮ The lefthand side N is a nonterminal from N ◮ The righthand side α is a sequence of zero or more terminals and/or nonterminals: α ∈ ( N ∪ Σ) ∗ ◮ Special case: Chomsky normal form constrains α to be either a single terminal symbol or two nonterminals 22 / 87

Example Phrase Structure Tree S Aux NP VP Det Noun does Verb NP this flight Det Noun include a meal The phrase-structure tree represents both the syntactic structure of the sentence and the derivation of the sentence under the grammar. E.g., VP corresponds to the Verb NP rule VP → Verb NP. 24 / 87

The First Phrase-Structure Tree (Chomsky, 1956) Sentence NP VP the man V NP the book took 25 / 87

Where do natural language CFGs come from? As evidenced by the discussion in Jurafsky and Martin (2008), building a CFG for a natural language by hand is really hard. 26 / 87

Where do natural language CFGs come from? As evidenced by the discussion in Jurafsky and Martin (2008), building a CFG for a natural language by hand is really hard. ◮ Need lots of categories to make sure all and only grammatical sentences are included. 27 / 87

Natural Language Processing (CSEP 517): Phrase Structure Syntax and - PowerPoint PPT Presentation

Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith 2017 c University of Washington nasmith@cs.washington.edu April 24, 2017 1 / 87 To-Do List Online quiz: due Sunday Ungraded mid-quarter

CSEP 517 Natural Language Processing Language Models Luke Zettlemoyer Slides adapted from Dan

CSEP 517 Natural Language Processing Introduction Luke Zettlemoyer Slides adapted from Dan

CSEP 517: Natural Language Processing New PMP Course! Instructor: Luke Zettlemoyer Autumn 2013

CSEP 517 Natural Language Processing Autumn 2018 Introduction Luke Zettlemoyer Slides adapted

Natural Language Processing (CSEP 517): Computational Pragmatics Chenhao Tan 2017 c

Natural Language Processing (CSEP 517): Introduction & Language Models Noah Smith c 2017

CSE 517 Natural Language Processing Winter 2015 Phrase Based Translation Yejin Choi Slides

CSEP 517 Natural Language Processing Autumn 2015 Parsing (Trees) Yejin Choi - University of

CSEP 517 Natural Language Processing Frame Semantics Luke Zettlemoyer Slides adapted from Yejin

CSEP 517: Natural Language Processing Recurrent Neural Networks Autumn 2018 Luke Zettlemoyer

CSEP 517 Natural Language Processing Autumn 2015 Introduction Yejin Choi Slides adapted

CSEP 517 Natural Language Processing Luke Zettlemoyer Machine Translation, Sequence-to-sequence

Natural Language Processing (CSEP 517): Distributional Semantics Roy Schwartz 2017 c

Natural Language Processing (CSEP 517): Machine Translation (Continued), Summarization, &

CSEP 517 Natural Language Processing Coreference Resolution Luke Zettlemoyer University of

Natural Language Processing (CSEP 517): Dependency Syntax and Parsing Noah Smith 2017 c

Optical Communications Telecommunication Engineering School of Engineering University of Rome La

Authoring Sensor-Based Interactions by Demonstration with Direct Manipulation and Pattern

Sweet Spotter Silver B Traditional Wired Wireless Sweet Product Vision Speakers Head-

Red Team A Control System and Assembly Stability For a Solar Trough Motivation Demonstrate

Head-Driven Phrase Structure Grammar (HPSG) Introduction Grammatikformalismen Lecture

Lecture 7: Language Structure: Grammar Kai-Wei Chang CS @ UCLA kw@kwchang.net Couse webpage:

Recap: Question 1 If passwords are strings starting with an uppercase letter and ending in a

Introduction Syntactic analysis (5LN455) Syntactic parsing (5LN713/5LN717) 2017-11-07 Sara

Natural Language Processing (CSEP 517): Phrase Structure Syntax and - PowerPoint PPT Presentation

Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith 2017 c University of Washington nasmith@cs.washington.edu April 24, 2017 1 / 87 To-Do List Online quiz: due Sunday Ungraded mid-quarter

CSEP 517 Natural Language Processing Language Models Luke Zettlemoyer Slides adapted from Dan

CSEP 517 Natural Language Processing Introduction Luke Zettlemoyer Slides adapted from Dan

CSEP 517: Natural Language Processing New PMP Course! Instructor: Luke Zettlemoyer Autumn 2013

CSEP 517 Natural Language Processing Autumn 2018 Introduction Luke Zettlemoyer Slides adapted

Natural Language Processing (CSEP 517): Computational Pragmatics Chenhao Tan 2017 c

Natural Language Processing (CSEP 517): Introduction &amp; Language Models Noah Smith c 2017

CSE 517 Natural Language Processing Winter 2015 Phrase Based Translation Yejin Choi Slides

CSEP 517 Natural Language Processing Autumn 2015 Parsing (Trees) Yejin Choi - University of

CSEP 517 Natural Language Processing Frame Semantics Luke Zettlemoyer Slides adapted from Yejin

CSEP 517: Natural Language Processing Recurrent Neural Networks Autumn 2018 Luke Zettlemoyer

CSEP 517 Natural Language Processing Autumn 2015 Introduction Yejin Choi Slides adapted

CSEP 517 Natural Language Processing Luke Zettlemoyer Machine Translation, Sequence-to-sequence

Natural Language Processing (CSEP 517): Distributional Semantics Roy Schwartz 2017 c

Natural Language Processing (CSEP 517): Machine Translation (Continued), Summarization, &amp;

CSEP 517 Natural Language Processing Coreference Resolution Luke Zettlemoyer University of

Natural Language Processing (CSEP 517): Dependency Syntax and Parsing Noah Smith 2017 c

Optical Communications Telecommunication Engineering School of Engineering University of Rome La

Authoring Sensor-Based Interactions by Demonstration with Direct Manipulation and Pattern

Sweet Spotter Silver B Traditional Wired Wireless Sweet Product Vision Speakers Head-

Red Team A Control System and Assembly Stability For a Solar Trough Motivation Demonstrate

Head-Driven Phrase Structure Grammar (HPSG) Introduction Grammatikformalismen Lecture

Lecture 7: Language Structure: Grammar Kai-Wei Chang CS @ UCLA kw@kwchang.net Couse webpage:

Recap: Question 1 If passwords are strings starting with an uppercase letter and ending in a

Introduction Syntactic analysis (5LN455) Syntactic parsing (5LN713/5LN717) 2017-11-07 Sara

Natural Language Processing (CSEP 517): Introduction & Language Models Noah Smith c 2017

Natural Language Processing (CSEP 517): Machine Translation (Continued), Summarization, &