Grammar in Performance and Acquisition: acquisition E Stabler, UCLA - - PowerPoint PPT Presentation

grammar in performance and acquisition acquisition
SMART_READER_LITE
LIVE PREVIEW

Grammar in Performance and Acquisition: acquisition E Stabler, UCLA - - PowerPoint PPT Presentation

Grammar in Performance and Acquisition: acquisition E Stabler, UCLA ENS Paris 2008 day 4 E Stabler, UCLA Grammar in Performance and Acquisition:acquisition goals goals Q1 How are utterances interpreted incrementally? Q2 How is


slide-1
SLIDE 1

Grammar in Performance and Acquisition: acquisition

E Stabler, UCLA ENS Paris • 2008 • day 4

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-2
SLIDE 2

goals goals

Q1 How are utterances interpreted ‘incrementally’? Q2 How is that ability acquired, from available evidence? Q3 Why are some constituent orders unattested across languages? Q4 What kind of grammar makes copying a natural option?

we don’t need to start from zero (start from grammar) frame explanations supported by convergent evidence

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-3
SLIDE 3

setup The problem Parameter setting Learnability theory Positive results The problem, factored

40 50 60 70 80 90 100 200 100 50 40 30 20 10 5 4 3 2 1

Cumulative percentage of the number of Frequency types bigrams trigrams types, bigrams or trigrams

tb2: ≈40% words unique, 75% bigrams, 90% trigrams, 99.7% sentences

⇒ most sentences heard only once

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-4
SLIDE 4

setup The problem Parameter setting Learnability theory Positive results The problem, factored

Parameter setting: methodology How are fundamental properties of language learned? Important to distinguish 2 ideas:

Uncontroversially, we usually aim to understand how the basic parameters of language variation are set, abstracting away from other properties. A controversial suggestion is that there may be a principled distinction between “core” parameters and “peripheral” parameters of variation, such that universal grammar “will make available only a finite class of possible core grammars, in principle,” (Chomsky’81)

The first idea is assumed here and in virtually all work on learning, in all domains; the second conjecture might or might not be true, and nothing mentioned here will depend on it.

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-5
SLIDE 5

setup The problem Parameter setting Learnability theory Positive results The problem, factored

Parameter setting: methodology How are fundamental properties of language learned? Gibson&Wexler’94: set n binary parameters on basis of input constituent orders vs, vos, vo1o2s, . . . →(spec-final, comp-final, not V2)

. . . in the case of Universal Grammar. . . we want the primitives to be concepts that can plausibly be assumed to provide a preliminary, prelinguistic analysis of a reasonable selection of presented data. it would be unreasonable to incorporate such notions as subject of a sentence or other grammatical notions, since it is unreasonable to suppose that these notions can be directly applied to linguistically unanalyzed data. (Chomsky, 1981) Suppose parameters are associated with (functional) heads, in the

  • lexicon. (Presumably tightly constrained – more on this later) The

learner needs to identify them. . .

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-6
SLIDE 6

setup The problem Parameter setting Learnability theory Positive results The problem, factored

Parameter setting: methodology How are fundamental properties of language learned? Gibson&Wexler’94: set n binary parameters on basis of input constituent orders vs, vos, vo1o2s, . . . →(spec-final, comp-final, not V2)

. . . in the case of Universal Grammar. . . we want the primitives to be concepts that can plausibly be assumed to provide a preliminary, prelinguistic analysis of a reasonable selection of presented data. it would be unreasonable to incorporate such notions as subject of a sentence or other grammatical notions, since it is unreasonable to suppose that these notions can be directly applied to linguistically unanalyzed data. (Chomsky, 1981) Suppose parameters are associated with (functional) heads, in the

  • lexicon. (Presumably tightly constrained – more on this later) The

learner needs to identify them. . .

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-7
SLIDE 7

setup The problem Parameter setting Learnability theory Positive results The problem, factored

Parameter setting: methodology How are fundamental properties of language learned? Gibson&Wexler’94: set n binary parameters on basis of input constituent orders vs, vos, vo1o2s, . . . →(spec-final, comp-final, not V2)

. . . in the case of Universal Grammar. . . we want the primitives to be concepts that can plausibly be assumed to provide a preliminary, prelinguistic analysis of a reasonable selection of presented data. it would be unreasonable to incorporate such notions as subject of a sentence or other grammatical notions, since it is unreasonable to suppose that these notions can be directly applied to linguistically unanalyzed data. (Chomsky, 1981) Suppose parameters are associated with (functional) heads, in the

  • lexicon. (Presumably tightly constrained – more on this later) The

learner needs to identify them. . .

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-8
SLIDE 8

setup The problem Parameter setting Learnability theory Positive results The problem, factored

(Gold, 1967; Angluin, 1980) A collection of languages is perfectly

identifiable from positive text iff every L has finite subset DL

DL L No such intermediate language L’

⇒ no superset of the class of finite languages is learnable in this sense (Pitt, 1989) If collection identifiable with p > 1

2, then learnable

in Gold’s sense

(cf. good review in Niyogi’06)

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-9
SLIDE 9

setup The problem Parameter setting Learnability theory Positive results The problem, factored

(Gold, 1967; Angluin, 1980) A collection of languages is perfectly

identifiable from positive text iff every L has finite subset DL

DL L No such intermediate language L’

⇒ no superset of the class of finite languages is learnable in this sense (Pitt, 1989) If collection identifiable with p > 1

2, then learnable

in Gold’s sense

(cf. good review in Niyogi’06)

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-10
SLIDE 10

setup The problem Parameter setting Learnability theory Positive results The problem, factored

Fin non−RE Rec RE Aspects,HPSG,LFG CS MGC CF Reg MG

CF⊂ TAG ≡ CCG ⊂ MCFG ≡ MG ⊂ MGC ⊆ PMCFG ⊂CS

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-11
SLIDE 11

setup The problem Parameter setting Learnability theory Positive results The problem, factored

A regular language is 0-reversible iff xz, yz ∈ L implies ∀w, xw ∈ L iff yw ∈ L

  • Fin

CF Reg

(Angluin’82): 0-reversible languages are learnable from positive text

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-12
SLIDE 12

setup The problem Parameter setting Learnability theory Positive results The problem, factored

A CFG is very simple iff every rule has form A → aα for pronounced (terminal) symbol a and sequence of categories α, where no two rules have the same pronounced element a. Example: S → & S S S → ¬ S S → p S → q

  • Fin

CF Reg

(Yokomori’03): VSLs are learnable from positive text

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-13
SLIDE 13

setup The problem Parameter setting Learnability theory Positive results The problem, factored

A CG is k-valued if no pronounced (terminal) symbol has more than k categories. Example: &::(S\S)/S ¬::S/S p::S q::S Example: and::(S\S)/S saw::(D\S)/D saw::N student::N vegetarian::N some::D/N every::D/N

  • Fin

CF Reg

(Kanazawa’94): k-valued categorial languages are learnable from function-argument trees (and learnable in principle from strings)

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-14
SLIDE 14

setup The problem Parameter setting Learnability theory Positive results The problem, factored

input: 12340, 15340642310,. . . Problem: What is the language? Does the language have structures you have not seen?

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-15
SLIDE 15

setup The problem Parameter setting Learnability theory Positive results The problem, factored

input: 12340, 15340642310,. . . dependencies (r,b,g)

1 2 4 3 1 5 4 3 6 3 4 2 1

Problem: What is the language? Does the language have structures you have not seen?

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-16
SLIDE 16

setup The problem Parameter setting Learnability theory Positive results The problem, factored

input: 12340, 15340642310, . . . dependencies (r,b,g), MG, lex unambiguous

1 2 4 3 1 5 4 3 6 3 4 2 1

Problem: What is the language? Does the language have structures you have not seen?

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-17
SLIDE 17

grammars Lex,Mrg from structures Grammar the learner example more complex examples

criticize::=D V -v praise::=D V -v

  • s::=v +v +case T

ǫ::=V +case =D v Beatrice::D -case Benedick::D -case and:=T =T T

> Beatrice > < criticize <

  • s:T

> > Benedick < TP DP(3) D’ D Beatrice T’ VP(2) V’ V criticize DP t(1) T’ T

  • s

vP DP t(3) v’ DP(1) D’ D Benedick v’ v VP t(2)

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-18
SLIDE 18

grammars Lex,Mrg from structures Grammar the learner example more complex examples

The same derivation in tuple form, fully explicit:

Beatrice criticize -s Benedick:T criticize -s Benedick:+case T,Beatrice:-case

  • s Benedick:+i +case T,criticize:-i,Beatrice:-case
  • s::=v +i +case T

Benedick:v,criticize:-i,Beatrice:-case Benedick:=D v,criticize:-i ǫ:+case =D v,criticize:-i,Benedick:-case ǫ::=V +case =D v criticize:V -i,Benedick:-case criticize::=D V -i Benedick::D -case Beatrice::D -case

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-19
SLIDE 19

grammars Lex,Mrg from structures Grammar the learner example more complex examples

The same derivation as a matching graph:

  • s

=v +i +case T =V +case =D v criticize =D V

  • i

Benedick D

  • case

Beatrice D

  • case

ǫ (This graph completely determines the derivation) Suppose the learner can identify these dependencies using semantic reasoning, but not the syntactic features. . . what do we have when features are removed?

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-20
SLIDE 20

grammars Lex,Mrg from structures Grammar the learner example more complex examples

Let’s call these MG dependency structures:

Beatrice praise Benedick

  • s

ǫ

From these, the learner can identify the language.

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-21
SLIDE 21

grammars Lex,Mrg from structures Grammar the learner example more complex examples

Let’s call these MG dependency structures:

Beatrice praise Benedick

  • s

ǫ

From these, the learner can identify the language.

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-22
SLIDE 22

grammars Lex,Mrg from structures Grammar the learner example more complex examples

the learner: given a sequence of dependency structures. . .

  • 1. label the root category
  • 2. identify first arcs of non-root nodes, add new category labels
  • 3. add new licensee features for each other incoming arc
  • 4. add pre-category feature to match each outgoing arc
  • 5. collect the lexicon
  • 6. assuming no lexical ambiguity, unify features

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-23
SLIDE 23

grammars Lex,Mrg from structures Grammar the learner example more complex examples

Input: d1

Beatrice criticize Benedick

  • s

ǫ

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-24
SLIDE 24

grammars Lex,Mrg from structures Grammar the learner example more complex examples

Step 1: Label root category

Beatrice criticize Benedick

1

  • s::T

3 2 1 3 1 2

ǫ

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-25
SLIDE 25

grammars Lex,Mrg from structures Grammar the learner example more complex examples

Step 2: Identify least incoming arcs of non-root nodes; add new category labels:

Beatrice::D criticize::E Benedick::F

>1

  • s::T

3 2 >1 >3 >1 2

ǫ::G

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-26
SLIDE 26

grammars Lex,Mrg from structures Grammar the learner example more complex examples

Step 3: Add new licensee features for each later incoming arc:

Beatrice::D -H criticize::E -J Benedick::F -I

>1

  • s::T

3 2 >1 >3 >1 2

ǫ::G

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-27
SLIDE 27

grammars Lex,Mrg from structures Grammar the learner example more complex examples

Step 4: Add precategory features to match other end of each

  • utgoing arc, in order (r,b,g):

Beatrice::D -H criticize::=F E -J Benedick::F -I

>1

  • s::=G +J +H T

3 2 >1 >3 >1 2

ǫ::=E +I =D G

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-28
SLIDE 28

grammars Lex,Mrg from structures Grammar the learner example more complex examples

Step 5. collect lexicon: GF(d1) is then: criticize::=F E -J

  • s::=G +J +H T

ǫ::=E +I =D G Beatrice::D -H Benedick::F -I The result of this step is always a grammar that defines exactly the dependency trees given in the input; nothing more. The grammar generates exactly the input string(s).

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-29
SLIDE 29

grammars Lex,Mrg from structures Grammar the learner example more complex examples

Step 6. unify to make rigid: GF(d1) already rigid, so GF(d1) = RG(d1) criticize::=F E -J

  • s::=G +J +H T

ǫ::=E +I =D G Beatrice::D -H Benedick::F -I criticize::=D V -v praise::=D V -v

  • s::=v +v +case T

ǫ::=V +case =D v Beatrice::D -case Benedick::D -case and::=T =T T

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-30
SLIDE 30

grammars Lex,Mrg from structures Grammar the learner example more complex examples

Input: d1, d2

Beatrice criticize Benedick

  • s

and

  • s

Benedick praise Beatrice

ǫ ǫ

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-31
SLIDE 31

grammars Lex,Mrg from structures Grammar the learner example more complex examples

Step 5: GF(d1, d2) is then: Beatrice::P -U Benedick::O -V and::=L =K T Beatrice::S -Y Benedick::C -X

  • s::=M +W +U K

ǫ::=N +V =P M criticize::=O N -W praise::=S R -Z

  • s::=Q +Z +X L

ǫ::=R +Y =C Q criticize::=F E -J

  • s::=G +J +H T

ǫ::=E +I =D G Beatrice::D -H Benedick::F -I NB: Again, GF(d1, d2) does not generalize at all.

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-32
SLIDE 32

grammars Lex,Mrg from structures Grammar the learner example more complex examples

Step 6. unify to make rigid: RG(d1, d2) = criticize::=D E -J praise::=D E -J

  • s::=G +J +H T

ǫ::=E +H =D G Beatrice::D -H Benedick::D -H and::=T =T T criticize::=D V -v praise::=D V -v

  • s::=v +v +case T

ǫ::=V +case =D v Beatrice::D -case Benedick::D -case and::=T =T T This strategy always works

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-33
SLIDE 33

grammars Lex,Mrg from structures Grammar the learner example more complex examples

Step 6. unify to make rigid: RG(d1, d2) = criticize::=D E -J praise::=D E -J

  • s::=G +J +H T

ǫ::=E +H =D G Beatrice::D -H Benedick::D -H and::=T =T T criticize::=D V -v praise::=D V -v

  • s::=v +v +case T

ǫ::=V +case =D v Beatrice::D -case Benedick::D -case and::=T =T T This strategy always works

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-34
SLIDE 34

grammars Lex,Mrg from structures Grammar the learner example more complex examples

input: 12340, 15340642310

Beatrice praise -s Benedick ǫ, Beatrice criticize -s Benedick ǫ and Benedick praise -s Beatrice ǫ.

dependencies (r,b,g), MG, lex unambiguous

1 2 4 3 1 5 4 3 6 3 4 2 1

Problem: What is the language?

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-35
SLIDE 35

grammars Lex,Mrg from structures Grammar the learner example more complex examples

a 2 2 1 1 2 3 1 2 2 1 2 1 1 2 1 2 b 3 1 2

a::C -r -l b::=C +r +l T ǫ::T 2::=C +r A -r 3::=C +r B -r 0::=A +l C -l 1::=B +l C -l cross-serial dependencies by ‘rolling-up’ (non-CF)

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-36
SLIDE 36

grammars Lex,Mrg from structures Grammar the learner example more complex examples

A MG is rigid if each pronounced (terminal) symbol has at most 1 set of syntactic features.

Fin Reg CF MG CS

Thm Given any rigid MG G, and any text of dependency structures t defined by G, this learning method will exactly identify the language after finitely many examples

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-37
SLIDE 37

extensions structures from strings grammars from ambiguous structures

structures from strings

Beatrice praise Benedick

  • s

ǫ

Selection: inferred from cognitively salient events

conditioned variation → lexical categories tight constraints on functional categories ⋆

Movement: non-adjacency with related elements

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-38
SLIDE 38

extensions structures from strings grammars from ambiguous structures

ambiguity

  • s::=v +v +case T
  • s:=N Num

(and more) ǫ::=V +case =D v ǫ::=T C (and more) read::=D V read::V read::N reed::N (and more) bill::=D V bill::V bill::N (and more) much ambiguity is systematic semantic features reduce syntactic ambiguity topic, semantic features from distributions?

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-39
SLIDE 39

extensions structures from strings grammars from ambiguous structures

ambiguity

  • s::=v +v +case T
  • s:=N Num

(and more) ǫ::=V +case =D v ǫ::=T C (and more) read::=D V read::V read::N reed::N (and more) bill::=D V bill::V bill::N (and more) much ambiguity is systematic semantic features reduce syntactic ambiguity topic, semantic features from distributions?

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-40
SLIDE 40

extensions structures from strings grammars from ambiguous structures

ambiguity

  • s::=v +v +case T
  • s:=N Num

(and more) ǫ::=V +case =D v ǫ::=T C (and more) read::=D V read::V read::N reed::N (and more) bill::=D V bill::V bill::N (and more) much ambiguity is systematic semantic features reduce syntactic ambiguity topic, semantic features from distributions?

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-41
SLIDE 41

extensions structures from strings grammars from ambiguous structures

ambiguity

  • s::=v +v +case T
  • s:=N Num

(and more) ǫ::=V +case =D v ǫ::=T C (and more) read::=D V read::V read::N reed::N (and more) bill::=D V bill::V bill::N (and more) much ambiguity is systematic semantic features reduce syntactic ambiguity topic, semantic features from distributions?

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-42
SLIDE 42

intermission 4 References Summary

Summary simple formalisms can model many linguistic proposals

Q3 Why are some constituent orders unattested? (perhaps DTC?) Q4 What grammars make copying a natural option? (MGC?) many open questions

Q1 What performance models allow incremental interpretation (and remnant movement, doubling constructions?)

a straightforward semantics can value every MGC constituent CKY, Earley efficiently parses every MGC fit the performance data with a parser that works!

Q2 How is this ability acquired, from available evidence?

rigid MGs can be learned from structures restricted possible structures aids: strings→structures many open questions!

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-43
SLIDE 43

intermission 4 References Summary

Recap: the learner given a sequence of dependency structures. . .

  • 1. label the root category
  • 2. identify first arcs of non-root nodes, add new category labels
  • 3. add new licensee features for each other incoming arc
  • 4. add pre-category feature to match each outgoing arc
  • 5. collect the lexicon
  • 6. assuming no lexical ambiguity, unify features

Thm Given any rigid MG G, and any text of dependency structures t defined by G, this learning method will exactly identify the language after finitely many examples

E Stabler, UCLA Grammar in Performance and Acquisition:acquisition

slide-44
SLIDE 44

intermission 4 References Angluin, Dana. 1980. Inductive inference of formal languages from positive data. Information and Control, 45:117–135. Angluin, Dana. 1982. Inference of reversible languages. Journal of the Association for Computing Machinery, 29:741–765. Buszkowski, Wojciech and Gerald Penn. 1990. Categorial grammars determined from linguistic data by unification. Studia Logica, 49:431–454. Chomsky, Noam. 1981. Lectures on Government and Binding. Foris, Dordrecht. Gold, E. Mark. 1967. Language identification in the limit. Information and Control, 10:447–474. Jain, Sanjay, Daniel Osherson, James S. Royer, and Arun Sharma. 1999. Systems that Learn: An Introduction to Learning Theory (second edition). MIT Press, Cambridge, Massachusetts. Kanazawa, Makoto. 1998. Learnable Classes of Categorial Grammmars. CSLI Publications, Stanford, California. Kearns, Michael J. and Umesh V. Vazirani. 1994. An Introduction to Computational Learning Theory. MIT Press, Cambridge, Massachusetts. Niyogi, Partha. 2006. The Computational Nature of Language Learning and Evolution. MIT Press, Cambridge, Massachusetts. Pitt, Leonard. 1989. Probabilistic inductive inference. Ph.D. thesis, University of Illinois. Retor´ e, Christian and Roberto Bonato. 2001. Learning rigid Lambek grammars and minimalist grammars from structured sentences. In L. Popel´ ınsk´ y and M. Nepil, editors, Proceedings of the Third Learning Language in Logic Workshop, LLL3, pages 23–34, Brno, Czech Republic. Faculty of Informatics, Masaryk University. Technical report FIMU-RS-2001-08. Rizzi, Luigi. 1994. Early null subjects and root null subjects. In Teun Hoekstra and Bonnie D. Schwartz, editors, Language Acquisition Studies in Generative Grammar. John Benjamins, Amsterdam, pages 151–176. Shawe-Taylor, John and Nello Cristianini. 2004. Kernel Methods for Pattern Analysis. Cambridge University Press, NY. Stabler, Edward P. 2002. Structures for learning. In CoLogNet Lecture, ESSLLI’02, Trento. Stabler, Edward P., Travis C. Collier, Gregory M. Kobele, Yoosook Lee, Ying Lin, Jason Riggle, Yuan Yao, and Charles E. Taylor. 2003. The learning and emergence of mildly context sensitive languages. In W. Banzhaf,

  • T. Christaller, P. Dittrich, J.T. Kim, and J. Ziegler, editors, Advances in Artificial Life. Springer, NY.

Yokomori, Takashi. 2003. Polynomial-time identification of very simple grammars from positive data. Theoretical Computer Science, 298:179–206. E Stabler, UCLA Grammar in Performance and Acquisition:acquisition