Complexity and Character of Human Languages The Faculty of Language - - PowerPoint PPT Presentation

complexity and character of human languages
SMART_READER_LITE
LIVE PREVIEW

Complexity and Character of Human Languages The Faculty of Language - - PowerPoint PPT Presentation

Human Language Complexity Human Language Complexity Linear Indexed Grammars Linear Indexed Grammars Complexity vs. Difficulty Complexity vs. Difficulty 1 Human Language Complexity Chomsky Hierarchy Complexity and Character of Human Languages


slide-1
SLIDE 1 Human Language Complexity Linear Indexed Grammars Complexity vs. Difficulty

Complexity and Character of Human Languages

Informatics 2A: Lecture 28 Bonnie Webber

School of Informatics University of Edinburgh bonnie@inf.ed.ac.uk

27 November 2009

Informatics 2A: Lecture 28 Complexity and Character of Human Languages 1 Human Language Complexity Linear Indexed Grammars Complexity vs. Difficulty 1 Human Language Complexity

Chomsky Hierarchy The Faculty of Language Strong and Weak Adequacy Excluding Complexity Classes

2 Linear Indexed Grammars 3 Complexity vs. Difficulty

Reading: J&M. Chapter 16.3–16.4; Hauser, Chomsky, and Fitch (2002)

Informatics 2A: Lecture 28 Complexity and Character of Human Languages 2 Human Language Complexity Linear Indexed Grammars Complexity vs. Difficulty Chomsky Hierarchy The Faculty of Language Strong and Weak Adequacy Excluding Complexity Classes

Review

At the top of the Chomsky Hierarchy are: recursive languages, where a Turing machine (TM) can halt

  • n deciding whether a string is or isn’t in the language;

recursively enumerable languages, where a TM can only halt with a decision as to whether a string is in the language. non-recursively enumerable languages that exceed the power

  • f even a TM to decide membership of an arbitrary string.

non recursively enumerable languages recursively enumerable languages recursive languages context-sensitive languages context-free languages regular languages

Informatics 2A: Lecture 28 Complexity and Character of Human Languages 3 Human Language Complexity Linear Indexed Grammars Complexity vs. Difficulty Chomsky Hierarchy The Faculty of Language Strong and Weak Adequacy Excluding Complexity Classes

Obvious Questions

Where do human languages fit within this complexity hierarchy? Do all human languages fit in the same place? Or are some languages more complex than others? Does the sense that one language is harder than another correspond to a difference in their complexity? What features of human languages seem to make them harder or easier?

Informatics 2A: Lecture 28 Complexity and Character of Human Languages 4
slide-2
SLIDE 2 Human Language Complexity Linear Indexed Grammars Complexity vs. Difficulty Chomsky Hierarchy The Faculty of Language Strong and Weak Adequacy Excluding Complexity Classes

The Faculty of Language

According to The Faculty of Language (Hauser, Chomsky, and Fitch 2002), “language faculty” has a broad sense and a narrow

  • sense. Broadly, it includes:

sensory-motor system for producing and perceiving linguistic communication; spoken language: vocal track, auditory system sign language: gestural system, visual system written language: writing system, visual or tactile system conceptual-intentional system, which establishes who to communicate with and what to communicate about, and involves generating mental states and attributing them to others; acquiring conceptual representations that are non-linguistic; referring to entities and events.

Informatics 2A: Lecture 28 Complexity and Character of Human Languages 5 Human Language Complexity Linear Indexed Grammars Complexity vs. Difficulty Chomsky Hierarchy The Faculty of Language Strong and Weak Adequacy Excluding Complexity Classes

The Faculty of Language

Narrowly, the language faculty is an abstract computational system, one part of which (narrow syntax) generates representations internal to the mind/brain and maps them to: sensory-motor interface through the phonological or gestural system; conceptual-intentional system through the semantic (and pragmatic) systems. Questions about the formal complexity of human language in general or a particular human language are about the computational power of syntax, as represented by a grammar that’s adequate for it.

Informatics 2A: Lecture 28 Complexity and Character of Human Languages 6 Human Language Complexity Linear Indexed Grammars Complexity vs. Difficulty Chomsky Hierarchy The Faculty of Language Strong and Weak Adequacy Excluding Complexity Classes

Strong and Weak Adequacy

When we ask whether a grammar of a particular complexity is adequate for a NL, we distinguish two senses of adequacy: A grammar is strongly adequate for a language if it: generates all and only the strings of the language; assigns them the “right” structures — ones that support a correct representation of meaning. A grammar is weakly adequate if it generates all and only the strings of a language but assigns them “wrong” structures. Adequacy(strong/weak) relates a language and a grammar. Equivalency(strong/weak) relates two grammars.

Informatics 2A: Lecture 28 Complexity and Character of Human Languages 7 Human Language Complexity Linear Indexed Grammars Complexity vs. Difficulty Chomsky Hierarchy The Faculty of Language Strong and Weak Adequacy Excluding Complexity Classes

Excluding Complexity Classes

It seems unlikely that natural language belongs to one of the complexity classes at the top of the Chomsky hierarchy. Why? If one believes that a child acquires a grammar when s/he learns a language (or several, if s/he is brought up multi-lingual), then the more complex the language class, the more difficult it would be to acquire the grammar. We are looking for an upper bound to the formal complexity of NL. But we also want to account for the fact that people seem to process language in near-linear time: Parsing even CFGs is of order n3 for sentences of length n.

Informatics 2A: Lecture 28 Complexity and Character of Human Languages 8
slide-3
SLIDE 3 Human Language Complexity Linear Indexed Grammars Complexity vs. Difficulty Chomsky Hierarchy The Faculty of Language Strong and Weak Adequacy Excluding Complexity Classes

Context-Free Languages

We saw in Lecture 25 that a sub-language of Swiss German and a sub-language of Dutch are both isomorphic to wanbmxcndmy, which is not context-free (Shieber, 1985).

  • mdat Wim1 Jan2 Henk3 de kinderen4 zag1 helpen2 leren3 zwemmen4

because Wim1 saw1 Jan2 help2 Henk3 teach3 the children4 to swim4 The thing to notice is these strings involve crossing dependencies: | zag | depends on | Wim |, and | helpen | depends on | Jan |, etc. So we need a new class of grammar to describe these constructions in Swiss German, Dutch, and every other language in which they

  • ccur.
Informatics 2A: Lecture 28 Complexity and Character of Human Languages 9 Human Language Complexity Linear Indexed Grammars Complexity vs. Difficulty

Linear Indexed Grammars

A linear indexed grammar (LIG) is more powerful than a CFG, but much less powerful than an arbitrary CSG – one of the “mildly CS” languages. . . . context-sensitive languages “mildly context-sensitive” languages indexed languages linear-indexed languages context-free languages regular languages

Informatics 2A: Lecture 28 Complexity and Character of Human Languages 10 Human Language Complexity Linear Indexed Grammars Complexity vs. Difficulty

Indexed Grammars

An indexed grammar has three disjoint sets of symbols: terminals, non-terminals and indices. An index is a stack of symbols that can be passed from the LHS of a rule to its RHS, allowing counting and recording what rules were applied in what order. S → Df pushes an f onto the index on D D → Dg pushes a g onto the index on D D → ABC passes the index on D to A, B and C g = A → Aa | B → Bb | C → Cc pops g from an index f = A → a | B → b | C → c pops f from an index

Informatics 2A: Lecture 28 Complexity and Character of Human Languages 11 Human Language Complexity Linear Indexed Grammars Complexity vs. Difficulty

Derivation in an Indexed Grammar

S → Df g = A → Aa | B → Bb | C → Cc D → Dg f = A → a | B → b | C → c D → ABC

S

Informatics 2A: Lecture 28 Complexity and Character of Human Languages 12
slide-4
SLIDE 4 Human Language Complexity Linear Indexed Grammars Complexity vs. Difficulty

Derivation in an Indexed Grammar

S → Df g = A → Aa | B → Bb | C → Cc D → Dg f = A → a | B → b | C → c D → ABC

S Df

Informatics 2A: Lecture 28 Complexity and Character of Human Languages 13 Human Language Complexity Linear Indexed Grammars Complexity vs. Difficulty

Derivation in an Indexed Grammar

S → Df g = A → Aa | B → Bb | C → Cc D → Dg f = A → a | B → b | C → c D → ABC

S Df Dgf

Informatics 2A: Lecture 28 Complexity and Character of Human Languages 14 Human Language Complexity Linear Indexed Grammars Complexity vs. Difficulty

Derivation in an Indexed Grammar

S → Df g = A → Aa | B → Bb | C → Cc D → Dg f = A → a | B → b | C → c D → ABC

S Df Dgf Agf Bgf Cgf

Informatics 2A: Lecture 28 Complexity and Character of Human Languages 15 Human Language Complexity Linear Indexed Grammars Complexity vs. Difficulty

Derivation in an Indexed Grammar

S → Df g = A → Aa | B → Bb | C → Cc D → Dg f = A → a | B → b | C → c D → ABC

S Df Dgf Agf Af a Bgf Cgf

Informatics 2A: Lecture 28 Complexity and Character of Human Languages 16
slide-5
SLIDE 5 Human Language Complexity Linear Indexed Grammars Complexity vs. Difficulty

Derivation in an Indexed Grammar

S → Df g = A → Aa | B → Bb | C → Cc D → Dg f = A → a | B → b | C → c D → ABC

S Df Dgf Agf Af a a Bgf Cgf

Informatics 2A: Lecture 28 Complexity and Character of Human Languages 17 Human Language Complexity Linear Indexed Grammars Complexity vs. Difficulty

Derivation in an Indexed Grammar

S → Df g = A → Aa | B → Bb | C → Cc D → Dg f = A → a | B → b | C → c D → ABC

S Df Dgf Agf Af a a Bgf Bf b b Cgf Cf c c

Informatics 2A: Lecture 28 Complexity and Character of Human Languages 18 Human Language Complexity Linear Indexed Grammars Complexity vs. Difficulty

Linear Indexed Grammars

Linear Indexed Grammars (LIGs) allow an index to pass to only one non-terminal on the RHS (not three, as in previous example). Here we’ll push numbers onto an index. An LIG for crossing dependencies in npkvk: S[...] → npi S[i,...] emit NP, push a number S[...] → S′

[...]

switch to verb sequence rule S′

[i,...]

→ S′

[...] vi

pop a number, emit a verb S′

[ ]

→ ǫ stop if stack is empty

Informatics 2A: Lecture 28 Complexity and Character of Human Languages 19 Human Language Complexity Linear Indexed Grammars Complexity vs. Difficulty

Example: LIG derivation for np3v 3

1 v v 2 v 3 S[] S’[] e S[1] S[2,1] S[3,2,1] S’[3,2,1] S’[2,1] S’[1] 1 2 3 np np np

While this grammar produces the kind of strings we want for crossing dependencies, the structures it generates are only weakly adequate, as they don’t associate NPs and Vs directly.

Informatics 2A: Lecture 28 Complexity and Character of Human Languages 20
slide-6
SLIDE 6 Human Language Complexity Linear Indexed Grammars Complexity vs. Difficulty

Linear Indexed Grammars

As a consequence of the weak adequacy of LIGs, other “mildly CS” grammar formalisms have been developed that are strongly adequate for NL: Tree Adjoining Grammar (TAG): a system of tree re-writing rules (ie, not string re-writing rules) in which elementary trees are combined by substitution and adjunction; Combinatory Categorial Grammar (CCG): a system that links words to complex categories that specify how adjacent words fit together, in terms of combinators like apply a functor to an argument, compose two functors, etc..

Informatics 2A: Lecture 28 Complexity and Character of Human Languages 21 Human Language Complexity Linear Indexed Grammars Complexity vs. Difficulty

Complexity vs. Difficulty

The formal complexity of NL is not the same as difficulty in processing particular sentences or constructions. Recall Garden Path sentences, which are difficult to process because their most common interpretation favors a syntactic analysis that is actually impossible. The horse raced past the barn fell. When the most common interpretation is compatible with this syntactic analysis, no problem arises in processing such sentence. The flowers delivered to the patient bloomed.

Informatics 2A: Lecture 28 Complexity and Character of Human Languages 22 Human Language Complexity Linear Indexed Grammars Complexity vs. Difficulty

Complexity vs. Difficulty

Others seem difficult to process for other reasons: The cat likes tuna fish. The cat the dog chased likes tuna fish. The cat the dog the rat bit chased likes tuna fish. The cat the dog the rat the goat kicked bit chased likes tuna fish. This is called center embedding. [The cat1 likes tuna fish1]. [The cat1 [the dog2 chased2] likes tuna fish1]. [The cat1 [the dog2 [the rat3 bit3] chased2] likes tuna fish1]. In Lectures 29 and 30, we’ll consider some of these processing difficulties when we consider models of human parsing.

Informatics 2A: Lecture 28 Complexity and Character of Human Languages 23 Human Language Complexity Linear Indexed Grammars Complexity vs. Difficulty

Summary

The faculty of language contains a computational system that generates syntactic representations that can be mapped onto meanings. This raises the question of the complexity of this system (its position in the Chomsky hierarchy). A weakly adequate grammar generates the correct strings, while a strongly adequate one also generates the correct structures. There are structures in NLs which can be mapped on formal languages which are not context-free. NL probably belongs to the class of mildly context-sensitive languages, whose least powerful member (LIGs) is weakly adequate for NL.

Informatics 2A: Lecture 28 Complexity and Character of Human Languages 24