Statistical Parsing Grammars and grammar formalisms ar ltekin - - PowerPoint PPT Presentation
Statistical Parsing Grammars and grammar formalisms ar ltekin - - PowerPoint PPT Presentation
Statistical Parsing Grammars and grammar formalisms ar ltekin University of Tbingen Seminar fr Sprachwissenschaft October 27, 2016 Recap amod NP NN natural NN languages nmod case IN nmod conj cc amod . ltekin,
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
This course is about …
NP NP JJ statistical NN constituency CC and NN dependency NN parsing PP IN
- f
NP NN natural NN languages nmod amod case nmod conj cc amod
Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 1 / 31
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
Why do we need syntactic parsing?
- Often, syntactic analysis is an intermediate step helping
(semantic) interpretation of sentences hence it is useful for applications like question answering, information extraction
- (Statistical) parsers are also used as language models for
applications like speech recognition and machine translation
- It can be used for grammar checking, and can be a useful tool
for linguistic research
Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 2 / 31
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
Ingredients of a parser
- A grammar
- An algorithm for parsing
- A method for ambiguity resolution
Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 3 / 31
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
Grammars
The term grammar is used for,
- a description of the whole system/structure of a
language—as in a ‘grammar (book) of English’
- a grammar formalism, that are often developed as theory
- f language—as in HPSG, LFG, CCG
- A formal (fjnite) specifjcation of a language as a possibly
infjnite set of strings (not necessarily a natural language)
Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 4 / 31
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
Plan of the lecture
- Constituency grammars
- Dependency grammars
- Brief notes on some major grammar formalisms
Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 5 / 31
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
Constituency grammars
- Constituency grammars are
probably the most studied grammars both in linguistics, and computer science
- The main idea is that a group of
words form natural groups, or ‘constituents’, like no noun phrases
- r word phrases
- phrase structure grammars or
context-free grammars are often used as synonyms S NP John VP V saw NP Marry
Note: many grammar formalisms use constituency grammars in some way, we will not focus on a particular grammar formalism here.
Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 6 / 31
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
What is a constituency
Linguists ofger a number of tests for constituency, such as
- They can answer questions:
Q: ‘What did John do? →A: ‘saw Marry’
but, presumably, no question with answer ‘John saw’
- Substitution with a pronoun forms:
Q: ‘John [read the book] last week? →A: ‘John [did that] last week.’
- Fronting, topicalization:
‘John likes [reading books]’ →‘[Reading books], John likes’
- Coordination:
John [saw Marry] and [said ‘hi’]
- …
Note, however, these tests are leaky, e.g., ‘[John saw] and [Peter greated] Marry’ (see Müller 2016, for more examples).
Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 7 / 31
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
What is a constituency
Linguists ofger a number of tests for constituency, such as
- They can answer questions:
Q: ‘What did John do? →A: ‘saw Marry’
but, presumably, no question with answer ‘John saw’
- Substitution with a pronoun forms:
Q: ‘John [read the book] last week? →A: ‘John [did that] last week.’
- Fronting, topicalization:
‘John likes [reading books]’ →‘[Reading books], John likes’
- Coordination:
John [saw Marry] and [said ‘hi’]
- …
Note, however, these tests are leaky, e.g., ‘[John saw] and [Peter greated] Marry’ (see Müller 2016, for more examples).
Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 7 / 31
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
Formal defjnition
A phrase structure grammar is a tuple (Σ, N, S, R) Σ is a set of terminal symbols N is a set of non-terminal symbols S is a distinguished start symbol R is a set of rules of the form
for
The grammar accepts a sentence if it can be derived from S with the rewrite rules R S NP John VP V saw NP Marry
Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 8 / 31
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
Formal defjnition
A phrase structure grammar is a tuple (Σ, N, S, R) Σ is a set of terminal symbols N is a set of non-terminal symbols S is a distinguished start symbol R is a set of rules of the form
for
The grammar accepts a sentence if it can be derived from S with the rewrite rules R S NP John VP V saw NP Marry
Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 8 / 31
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
Formal defjnition
A phrase structure grammar is a tuple (Σ, N, S, R) Σ is a set of terminal symbols N is a set of non-terminal symbols S ∈ N is a distinguished start symbol R is a set of rules of the form
for
The grammar accepts a sentence if it can be derived from S with the rewrite rules R S NP John VP V saw NP Marry
Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 8 / 31
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
Formal defjnition
A phrase structure grammar is a tuple (Σ, N, S, R) Σ is a set of terminal symbols N is a set of non-terminal symbols S ∈ N is a distinguished start symbol R is a set of rules of the form
αAβ → γ for A ∈ N α, β, γ ∈ Σ ∪ N
The grammar accepts a sentence if it can be derived from S with the rewrite rules R S NP John VP V saw NP Marry S → NP VP VP → V NP NP → John | Marry V → saw
Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 8 / 31
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
Formal defjnition
A phrase structure grammar is a tuple (Σ, N, S, R) Σ is a set of terminal symbols N is a set of non-terminal symbols S ∈ N is a distinguished start symbol R is a set of rules of the form
αAβ → γ for A ∈ N α, β, γ ∈ Σ ∪ N
- The grammar accepts a sentence if it
can be derived from S with the rewrite rules R S NP John VP V saw NP Marry S → NP VP VP → V NP NP → John | Marry V → saw
Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 8 / 31
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
Example derivation
The example grammar: S → NP VP VP → V NP NP → John | Marry V → saw
- Phrase structure grammars derive a sentence with
successive application of rewrite rules.
S ⇒NP VP ⇒John VP ⇒John V NP ⇒John saw NP ⇒John saw Marry
- r, S
∗
⇒John saw Marry
- The intermediate forms that contain non-terminals are
called sentential forms
Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 9 / 31
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
Chomsky hierarchy of grammars
type 0 Recursively enumerable, recognized by Turing machines (HPSG, LFG) αAβ → γ type 1 Context sensitive, recognized by linear-bound automaton αAβ → αγβ, γ ̸= ϵ type 2.1 Mildly context sensitive (TAG, CCG) type 2 Context free, recognized by push-down automata A → α type 3 Regular, recognized by fjnite-state automata A → aB
- r
A → Ba
In all of the above A and B are non-terminals, a is a terminal symbol, α, β, γ are sequences of terminals and non-terminals, and ϵ is the empty string.
Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 10 / 31
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
Some examples
- Regular grammars (fjnite-state automata) do not have any
memory can represent a∗b∗, but not anbn
- Finite-state automata are used in many tasks in CL,
including morphological analysis, partial parsing.
- Context free grammars (push-down automata) uses a stack
can represent anbn, anbmcmdn, but not anbmcndm
- Context-free grammars form the basis of most natural
language parsers
- Context-sensitive languages can do all of the above but
they are too powerful, hence too expensive
- Some level of context sensitiveness seems to be necessary
for some syntactic phenomena.
Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 11 / 31
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
Chomsky hierarchy: the picture
Regular Context Free Context Sensitive Recursively Enumerable
- Chomsky hierarchy of languages form a hierarchy (with some
care about empty language)
- It is often claimed that mildly context sensitive grammars
(dashed ellipse) are adequate for representing natural languages
- Note, however, not even every regular language is a potential
natural language (e.g., a∗bbc∗). The possible natural languages probably cross-cut this hierarchy (shaded region)
Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 12 / 31
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
Expressiveness of grammar classes
- The class of grammars adequate for formally describing
natural languages has been an important question for (computational) linguistics
- For the most part, context-free grammars are enough, but
there are some examples, e.g., from Swiss German (Shieber 1985) Jan säit das… …mer em Hans es huss hälfed aastriiche …we Hans (dat) house (acc) helped paint Note that this resembles anbmcndm.
Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 13 / 31
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
Constituency grammars and parsing
- Context-free grammars are often parseable with
complexity O(n3) using dynamic programming algorithms
- Mildly context-sensitive grammars can also be parsed in
polynomial time (O(n6))
- Often greedy search algorithms are used (even for CFG or
equivalent classes of grammars)
Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 14 / 31
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
Constituency grammars summary
- Constituency, or phrase structure, grammars builds on the
idea that some words form constituents (non-terminals in a formal grammar)
- They are well studied, both in linguistics and computer
science
- Context free grammars are the most common class of
phrase structure grammars used in parsing natural or programming languages (maybe with some extensions)
Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 15 / 31
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
Dependency grammars
- Dependency grammars gained popularity in (particularly
in computational) linguistics rather recently, but their roots can be traced back to a few thousand years (modern dependency grammars are attributed to Tesnière 1959)
- The main idea is capturing the relation between the words,
rather than (abstract) phrases John saw Marry
subject
- bject
root
Note: like constituency grammars, we will not focus on a particular dependency formalism, but discuss it in general in relation to parsing.
Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 16 / 31
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
Properties of dependency grammars
John saw Marry
subject
- bject
root
- The structure of the sentence is represented by asymmetric
binary links between lexical items
- Each relation defjnes one of the words as the head and the
- ther as dependent
- The links (relations) have labels (dependency types)
- Most dependency grammar require each word to have only
a single head
Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 17 / 31
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
A more realistic example
From the AP comes this story :
root nmod det case nsubj det punct
Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 18 / 31
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
A more realistic example
From the AP comes this story :
root nmod det case nsubj det punct
Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 18 / 31
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
How to determine heads
- 1. Head (H) determines the syntactic category of the
construction (C) and can often replace C
- 2. H determines the semantic category of C; the dependent (D)
gives semantic specifjcation
- 3. H is obligatory, D may be optional
- 4. H selects D and determines whether D is obligatory or
- ptional
- 5. The form and/or position of dependent is determined by
the head
- 6. The form of D depends on H
- 7. The linear position of D is specifjed with reference to H
(from Kübler, McDonald, and Nivre 2009, p.3–4) Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 19 / 31
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
Issues with head assignment and dependency labels
- Like the tests for constituency, determining heads are not
always straightforward
- A construction is called endocentric if the head can replace
the whole construction, exocentric otherwise
syntactic parsing
amod
saw Marry
- bj
- It is often unclear whether dependency labels encode
syntactic or semantic functions
Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 20 / 31
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
Some tricky constructions
- Coordination
John and Marry work
subj cc conj
John and Marry work
subj cc conj
John and Marry work
subj conj conj
- Prepositional phrases
…works from home
vcompl pcompl
…works from home
nmod case
- Subordinate clauses
think that they can…
- bj
sbar subj
think that they can…
- bj
mark subj
- Auxiliaries vs. main verbs
…will work
root aux
…will work
root aux
Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 21 / 31
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
Projective vs. non-projective dependencies
- If a dependency graph has no crossing edges, it is said to
be projective, otherwise non-projective
- Non-projectivity stem from long-distance dependencies
and free word order A non-projective tree example:
A hearing is scheduled
- n
the issue today .
ROOT VC PUNC SBJ NMOD PP TMP NP NMOD (tree reproduced from McDonald and Satta 2007) Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 22 / 31
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
Projective vs. non-projective dependencies
- If a dependency graph has no crossing edges, it is said to
be projective, otherwise non-projective
- Non-projectivity stem from long-distance dependencies
and free word order A non-projective tree example:
A hearing is scheduled
- n
the issue today .
ROOT VC PUNC SBJ NMOD PP TMP NP NMOD (tree reproduced from McDonald and Satta 2007) Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 22 / 31
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
Parsing with dependency grammars
- Projective dependency parsing can be done in polynomial
time
- Non-projective parsing is NP-hard (without restrictions)
- For both, it is a common practice to use greedy (e.g., linear
time) algorithms
Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 23 / 31
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
Dependency vs. constituency
- Constituency grammars are based on units formed by a
group of lexical items (constituents or phrases)
- Dependency grammars model binary head–dependent
relations between words
- Most of the theory of parsing is developed with
constituency grammars
- Dependency grammars has recently become more popular
in CL
- Note that many formalisms and treebanks follow a hybrid
approach, using ideas from both
Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 24 / 31
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
Conversion between constituencies and dependencies
- Although non-trivial conversion between dependency and
consitituency annotation is possible
- On can take the path between two words as a dependency
relation
S NP John VP V saw NP Marry John saw Marry
subject
- bject
root
- The conversion from constituencies to dependencies is a
common practice in the fjeld
Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 25 / 31
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
Chomskian tradition
1950’s Phrase structure/context sensitive grammars - more emphasis on precise/computational descriptions 1960’s Transformational grammars - not as precise defjnition, diffjcult for computational approaches 1980’s Government and binding theory (GB) 1990’s Minimalist program
Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 26 / 31
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
Some computationally oriented grammar formalisms
Some of the grammars that we will encounter in the articles we will read are.
- Generalized phrase structured grammars (GPSG)
- Head-driven phrase structure grammar (HPSG)
- Lexical functional grammar (LFG)
- Tree adjoining grammars (TAG)
- Combinatory categorial grammar (CCG)
Common themes:
- they are lexicalized (we’ll see later what that means)
- most combine features of both constituency and
dependency grammars
- most of them are more expressive/complex than CFGs
Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 27 / 31
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
Summary
- A grammar a formal device for specifying a language
- Grammars are one of the important components of a
parser, they can be hand-crafted or extracted from a treebank
- Most of the parsing theory and practice is based on
constituency, particularly context-free, grammars
- Dependency grammars have become more popular
recently, and often easier to use in NLP applications
Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 28 / 31
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
Where to go from here?
- Müller (2016) is a new open-source text book on Grammar
formalisms.
- Aho and Ullman (1972) is the classical reference (available
- nline) for parsing (programming languages) and also
includes discussion of grammar classes in the Chomsky
- hierarchy. A more up-to-date alternative is Aho, Lam, et al.
(2007).
- There is a brief introductory section on dependency
grammars in Kübler, McDonald, and Nivre (2009), for a classical reference see Tesnière (2015), English translation
- f the original version (Tesnière 1959).
Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 29 / 31
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
Pointers to some treebanks
Treebanks are the main resource for statistical parsing. A few treebank-related resources to have a look at until next time:
- Tübingen treebanks:
TüBa-D/Z written German TüBa-D/S spoken German TüBa-E/S spoken English TüBa-J/S spoken Japanese available from http:
//www.sfs.uni-tuebingen.de/en/ascl/resources/corpora.html
- Universal dependencies project, documentation, treebanks:
http://universaldependencies.org/
- TüNDRA - a treebank search and visualization application
with the above treebanks and few more
– Main version:
https://weblicht.sfs.uni-tuebingen.de/Tundra/
– New version (beta):
https://weblicht.sfs.uni-tuebingen.de/tundra-beta/
Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 30 / 31
Recap Introduction Constituency grammars Dependency grammars Grammar formalisms Finale
Next week: your second assignment
- We will got through your example sentences and try to
analyze them with constituency and dependency annotations
- Before next Thursday:
– Annotate the sentences using UD annotation scheme – Annotate the sentences using constituency annotations (you can freely choose the annotation scheme, if you need inspiration check out the Penn treebank annotation guidelines)
Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 31 / 31
Bibliography
Aho, Alfred V., Monica S. Lam, Ravi Sethi, and Jefgrey D. Ullman (2007). Compilers: Principles, Techniques, and Tools.
- 2nd. Pearson Education. isbn: 0-321-48681-1.
Aho, Alfred V. and Jefgrey D. Ullman (1972). The Theory of Parsing, Translation, and Compiling. Volume I: Parsing. Vol. I. Upper Saddle River, NJ, USA: Prentice-Hall. isbn: 0-13-914556-7. url: http://dl.acm.org/citation.cfm?id=578789. Kübler, Sandra, Ryan McDonald, and Joakim Nivre (2009). Dependency Parsing. Synthesis lectures on human language technologies. Morgan & Claypool. isbn: 9781598295962. McDonald, Ryan and Giorgio Satta (2007). “On the complexity of non-projective data-driven dependency parsing”. In: Proceedings of the 10th International Conference on Parsing Technologies. Association for Computational Linguistics, pp. 121–132. Müller, Stefan (2016). Grammatical theory: From transformational grammar to constraint-based approaches. Vol. 1. Textbooks in Language Sciences. Language Science Press. isbn: 9783944675213. doi: 10.17169/langsci.b25.168. url: http://langsci-press.org//catalog/book/25. Shieber, Stuart M. (1985). “Evidence against the context-freeness of natural language”. In: Linguistics and Philosophy 8.3, pp. 333–343. doi: 10.1007/BF00630917. Tesnière, Lucien (1959). Éléments de syntaxe structurale. Paris: Éditions Klinksieck. — (2015). Elements of Structural Syntax. Trans. by Timothy John Osborne and Sylvain Kahane. Amsterdam: John Benjamins Publishing Company. isbn: 9789027212122. Ç. Çöltekin, SfS / University of Tübingen October 27, 2016 A.1