Syntax & Grammars
CMSC 723 / LING 723 / INST 725 MARINE CARPUAT
marine@cs.umd.edu
Syntax & Grammars CMSC 723 / LING 723 / INST 725 M ARINE C - - PowerPoint PPT Presentation
Syntax & Grammars CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu T odays Agenda Words structure meaning Formal Grammars Context-free grammar Dependency grammars Treebanks Coming next
CMSC 723 / LING 723 / INST 725 MARINE CARPUAT
marine@cs.umd.edu
– Context-free grammar – Dependency grammars – Treebanks
– P1 recap! + parsing – Midterm is on Oct
knowledge of a native speaker
– Acquired by around three years old, without explicit instruction – It’s already inside our heads, we’re just trying to formally capture it
– “Don’t split infinitives” – “Don’t end sentences with prepositions”
– Grammar checkers – Conversational agents – Question answering – Information extraction – Machine translation – …
– Phrase structure organizes words in nested constituents
– Shows which words depend on (modify or are arguments of) which on other words
– With respect to their internal structure: e.g., at the core of a noun phrase is a noun – With respect to other constituents: e.g., noun phrases generally occur before verbs
– They can all precede verbs – They can all be preposed – …
– What are the “right” set of constituents? – What rules govern how they combine?
– That’s why there are so many different theories of grammar and competing analyses of the same data!
– Focus primarily on the “machinery” – Doesn’t correspond to any modern linguistic theory
– Aka phrase structure grammars – Aka Backus-Naur form (BNF)
– Rules – Terminals – Non-terminals
– We’ll take these to be words (for now)
– The constituents in a language (e.g., noun phrase)
– Consist of a single non-terminal on the left and any number of terminals and non- terminals on the right
Here are some rules for our noun phrases
– Rules 1 & 2 describe two kinds of NPs:
– Rule 3 illustrates two things:
– Covers all tokens in the input string – Covers only the tokens in the input string
– Derivation can be represented as a parse tree – Multiple derivations?
Note: equivalence between parse trees and bracket notation
– Issue: agreement
– Issue: subcategorization
S NP VP
S VP
S Aux NP VP
S WH-NP Aux NP VP
– Consider: “All the morning flights from Denver to Tampa leaving before 10”
“head” = central, most critical part of the NP “stuff that comes before” “stuff that comes after”
– Simple lexical items: the, this, a, an, etc. (e.g., “a car”) – Or simple possessives (e.g., “John’s car”) – Or complex recursive versions thereof (e.g., John’s sister’s husband’s son’s car)
– Cardinals, ordinals, etc. (e.g., “three cars”) – Adjectives (e.g., “large car”)
– “three large cars” vs. “?large three cars”
– Prepositional phrases (e.g., “from Seattle”) – Non-finite clauses (e.g., “arriving before noon”) – Relative clauses (e.g., “that serve breakfast”)
– Nominal Nominal PP – Nominal Nominal GerundVP – Nominal Nominal RelClause
This flight Those flights One flight Two flights *This flights *Those flight *One flights *Two flight
– Accepts grammatical examples (this flight) – Also accepts ungrammatical examples (*these flight)
– SgS SgNP SgVP – PlS PlNP PlVP – SgNP SgDet SgNom – PlNP PlDet PlNom – PlVP PlV NP – SgVP SgV Np
– Lot’s of issues though...
– But there are many alternatives out there…
– Non-terminals don’t actually appear in the sentence – So what if you got rid of them?
where:
– Nodes represent words – Edges represent dependency relations between words (typed or untyped, directed or undirected)
They hid the letter on the shelf Compare with constituent parse… What’s the relation?
paired with a parse tree
– Hopefully the right one!
– By first parsing the collection with an automatic parser – And then having human annotators correct each parse as necessary
– Detailed annotation guidelines are needed – Explicit instructions for dealing with particular constructions
– 1 million words from the Wall Street Journal
– Recursion avoided to ease annotators burden
– VP VBD PP – VP VBD PP PP – VP VBD PP PP PP – VP VBD PP PP PP PP
– Context-Free Grammars – Dependency grammars – Can be used to capture various facts about the structure of language (but not all!)
– P1 recap! – parsing