Unifying Models of Cognition Rens Bod VICI-Project Integrating - - PowerPoint PPT Presentation

▶

unifying models of cognition

Unifying Models of Cognition Rens Bod VICI-Project Integrating - - PowerPoint PPT Presentation

Sep 09, 2022 672 likes •1.14k views

Unifying Models of Cognition Rens Bod VICI-Project Integrating Cognition Institute for Logic, Language and Computation University of Amsterdam Goals of this Lecture Ill give you a very brief introduction to the work in my Vici- group

slide-1

SLIDE 1

Unifying Models of Cognition

Rens Bod

VICI-Project “Integrating Cognition”

Institute for Logic, Language and Computation University of Amsterdam

slide-2

SLIDE 2

Goals of this Lecture

I’ll give you a very brief introduction to the work in my Vici- group Integrating Cognition (9 people, NWO-funded) About myself:

Part of the LaCo (Language and Computation) group in the ILLC (27 people) Both affiliated at U. of St Andrews and U. of Amsterdam Computational linguist and Cognitive Scientist At the same time, I am writing an overview History of the Humanities (will be published in spring 2010)

slide-3

SLIDE 3

What do different forms of cognition have in common?

E.g.: Language: "List the sales of products in 2003"

slide-4

SLIDE 4

What do different forms of cognition have in common?

E.g.: Language: "List the sales of products in 2003" Music:

...

slide-5

SLIDE 5

What do different forms of cognition have in common?

E.g.: Language: "List the sales of products in 2003" Music:

...

Image:

At first sight very little...

slide-6

SLIDE 6

How do we perceive Language, Music and Image?

Inherent to all forms of perception: A structuring process in groups, subgroups, sub-subgroups, etc. E.g., in music, grouping structure is respresented as: Grouping structure represents how parts combine compositionally and recursively into a whole

slide-7

SLIDE 7

Grouping Structure = Tree Structure

is equivalent (isomorphic) with:

slide-8

SLIDE 8

Grouping Structure in Language

Groups in language form a tree structure (Wundt 1880):

List the sales of products in 2003

slide-9

SLIDE 9

Grouping Structure in Language

Groups in language form a tree structure (Wundt 1880):

List the sales of products in 2003

Grouping structure in different representations (Chomsky 1956):

List the sales of products in 2003

V DT N P N P N NP PP PP NP NP S

List the sales of products in 2003

slide-10

SLIDE 10

Also Visual Groups form a Tree Structure

According to Wertheimer (1923) the visual input is assigned the following structure:

slide-11

SLIDE 11

Perceptual Structure = Tree Structure

List the sales of products in 2003

V DT N P N P N NP PP PP NP NP S

Relatively Uncontroversial: There exists one representation for structural perception for all modalities

slide-12

SLIDE 12

Perceptual Structure = Tree Structure

List the sales of products in 2003

V DT N P N P N NP PP PP NP NP S

Relatively Uncontroversial: There exists one representation for structural perception for all modalities Very Controversial: There exists one model that predicts the perceived structure in language, music, vision and other modalities… (cf. Newell 1999)

slide-13

SLIDE 13

Additional Problem: Perception is Ambiguous

List the sales of products in 2003

V DT N P N P N NP PP PP NP NP S

List the sales of products in 2003

V DT N P N P N NP PP PP NP S

The same input can be assigned several structures: ambiguity

slide-14

SLIDE 14

Ambiguity is a major problem

Average sentence from Wall Street Journal: more than one million different possible tree structures (Charniak 1999) Adding semantics makes the problem even worse! "Any given sequence of notes is infinitely ambiguous, but this ambiguity is seldom apparent to the listener" (Longuet-Higgins 1987) Humans perceive mostly just one grouping structure

slide-15

SLIDE 15

Ambiguity is a major problem

Average sentence from Wall Street Journal: more than one million different possible tree structures (Charniak 1999) Adding semantics makes the problem even worse! "Any given sequence of notes is infinitely ambiguous, but this ambiguity is seldom apparent to the listener" (Longuet-Higgins 1987) Humans perceive mostly just one grouping structure > 96% agreement among subjects (language users) Language: Penn Treebank Music: Essen Folksong Collection Vision: Nijmegen Visual Database

slide-16

SLIDE 16

Historically, two competing principles for solving ambiguity

1. Simplicity Principle (Wertheimer 1923...Leeuwenberg 2001, Chater 2007)

Preference for the simplest structure

2. Likelihood Principle (Helmholtz 1910...Suppes 1984, Charniak 2001)

Preference for the most likely structure Can these principles still inspire us?

slide-17

SLIDE 17

The Dual Nature of Perception

These principles each play a different role in perception: Simplicity: general preference for "economy", "least effort", "shortest derivation" Likelihood: a memory-based bias due to previous experiences

slide-18

SLIDE 18

The Dual Nature of Perception

These principles each play a different role in perception: Simplicity: general preference for "economy", "least effort", "shortest derivation" Likelihood: a memory-based bias due to previous experiences Hypothesis: perceptual system strives for the simplest structure

but in doing so it is influenced by the likelihood of

previous structures

slide-19

SLIDE 19

Possible Measures for Simplicity and Likelihood

Simplicity: number of "steps" to generate a tree structure Likelihood: joint probability of the steps to generate a tree structure We can compute this if we have a large, representative collection

f tree structures for each modality (a "corpus")

slide-20

SLIDE 20

Possible Measures for Simplicity and Likelihood

Simplicity: number of "steps" to generate a tree structure Likelihood: joint probability of the steps to generate a tree structure We can compute this if we have a large, representative collection

f tree structures for each modality (a "corpus")

Data-Oriented Parsing model (DOP):

New input is analyzed and interpreted by combining parts of previously perceived input (Scha 1990; Bod 1992, 1998; Sima’an 1995; Kaplan 1996; Goodman

1996; Way 1999; Rajman 1999; Hearne 2003; Post 2009 etc.)

slide-21

SLIDE 21

Example of a DOP model for Language

Let's start with an extremely simple corpus:

S NP she VP VP V NP PP P NP S NP VP V wanted NP NP PP NP P she the dress the rack

n

the dog the saw with telescope

slide-22

SLIDE 22

A new sentence such as "She saw the dress with the telescope" is analyzed by combining subtrees from the corpus

S NP VP V NP NP PP she the dress V saw PP P NP the with telescope

=

S NP VP V NP NP PP she the dress saw P NP the with telescope

° °

where "o" is left-most node substitution

slide-23

SLIDE 23

But there is also a "competing" analysis:

NP the dress

°

=

S NP she VP VP V NP PP P NP the the saw with telescope dress S NP she VP VP V NP PP P NP the saw with telescope

This analysis consists of two steps, and is therefore preferred according to the simplicity principle: maximal similarity with corpus. But it is not preferred according to the likelihood principle

slide-24

SLIDE 24

But there is also a "competing" analysis:

NP the dress

°

=

S NP she VP VP V NP PP P NP the the saw with telescope dress S NP she VP VP V NP PP P NP the saw with telescope

This analysis consists of two steps, and is therefore preferred according to the simplicity principle: maximal similarity with corpus. But it is not preferred according to the likelihood principle

slide-25

SLIDE 25

S NP she VP VP V NP PP P NP S NP VP V wanted NP NP PP NP P she the dress the rack

n

the dog the saw with telescope PP P NP the with telescope

Corpus Decompositie Recompositie

S NP VP V NP NP she she NP VP VP PP NP V saw VP V NP the dog saw NP the dog S NP she VP S NP she VP VP PP P NP the with telescope NP the dress

°

=

S NP she VP VP V NP PP P NP the the saw with telescope dress V NP saw NP the dress PP P NP P

n

S NP she VP VP PP P NP the with telescope V NP saw

etc.

slide-26

SLIDE 26

Definitions of Likelihood & Simplicity

Probability of a subtree:

P(t) = | t |

Σ t': r(t')=r(t) | t' |

Most probable derivation of a sentence: P(t1°...°tk) = Πi P(ti) Shortest derivation of a sentence: Tsd = argmin L(dT) T

slide-27

SLIDE 27

DOP models can be formalized as Stochastic Tree Grammars

DOP is a tree grammar where the tree-units can be of arbitrary size: it allows for the possibility that units of any size may play a role By putting constraints on the tree-units, DOP subsumes:

stochastic context-free grammars
stochastic head-lexicalized grammars
stochastic tree-adjoining grammars
stochastic regular grammars
…

slide-28

SLIDE 28

Test Domains

Linguistic test domain:

Penn Treebank Wall Street Journal (WSJ) corpus: 50,000 manually analyzed sentences

Musical test domain:

Essen Folksong Collection (EFC): 20,150 melodically analyzed western folksongs:

Pitches: numbers from 1 to 7
Duration indicators: underscore (_) or a period (.) after the numbers
Octave position: plus and minus signs (+,-) before the numbers
Chromatic alterations: "#" or "b" after the numbers
Pauses: 0, possibly followed by duration indicators

slide-29

SLIDE 29

Example from Essen Folksong Collection

#4551: Schneckhaus Schneckhaus stecke deine Hörner aus

(German children song)

5_3_5_3_1234553_1234553_12345_3_12345_3_553_553_553_65432_1_

Grouping structure according to Essen Folksong collection:

((5_3_5_3_) (((1234553_) (1234553_)) ((12345_3_)( 12345_3_))) ((553_553_) (553_65432_1_)))

slide-30

SLIDE 30

30

Examples of musical structures:

6b 5 5_ P P 6b 5 5_ P P 6b 5 P P +3b 5_ 0_ P P S N N N N N N N N N N N

N N N N N N

5

1 1 1 2 N 3_ 1 P N N N N N N 3 3 3 2 2 1_. P S

We can use the same DOP model for both linguistic and musical analysis: Analysis = decomposition & recomposition

slide-31

SLIDE 31

31

(Example of) Experimental Evaluation

Corpora are randomly divided into 10 training/test set splits Test 1: Simplicity-Likelihood-DOP (SL-DOP) Selects simplest structure from among n likeliest structures Test 2: Likelihood-Simplicity-DOP (LS-DOP) Selects likeliest structure from among n simplest structures

slide-32

SLIDE 32

32

Scores of SL-DOP & LS-DOP

n SL-DOP LS-DOP

(simplest among n likeliest) (likeliest among n simplest)

Language Music Language Music 1 87.9% 86.0% 85.6% 84.3% 5 89.3% 86.8% 86.1% 85.5% 10 90.2% 87.2% 87.0% 85.7% 11 90.2% 87.3% 87.0% 85.7% 12 90.2% 87.3% 87.0% 85.7% 13 90.2% 87.3% 87.0% 85.7% 14 90.2% 87.2% 87.0% 85.7% 15 90.2% 87.2% 87.0% 85.7% 20 90.0% 86.9% 87.1% 85.7% 50 88.7% 85.6% 87.4% 86.0% 100 86.8% 84.3% 87.9% 86.0% 1,000 85.6% 84.3% 87.9% 86.0%

1. Same model obtains maximal scores for both language and music
2. Perceptual system strives for the simplest analysis, but searches
nly among the most probable analyses

slide-33

SLIDE 33

33

Where do initial trees come from?

How do we learn the initial tree structures in language, music and

problem solving?

Current VICI project:

Main goal: The development of a general, unsupervised learning model for different modalities (language, music, problem-solving, …) Key-Idea: If we do not know which trees should be assigned to initial input, assign them all and train them on new data

slide-34

SLIDE 34

34

U-DOP: Unsupervised DOP

1. Assign all possible binary trees to input (where we leave the internal

labels unspecified)

E.g.: Investors suffered heavy losses:

Investors suffered heavy losses X X S

Investors suffered heavy losses

X X S

Investors suffered heavy losses

X X S

Investors suffered heavy losses X X S

Investors suffered heavy losses

X X S

slide-35

SLIDE 35

35

2. Divide these trees into all subtrees (& store them in a packed chart)

Investors suffered X heavy losses X suffered heavy X Investors losses X X S suffered X X Investors suffered X X S heavy losses X X S

etc…

slide-36

SLIDE 36

36

3. Analyze new input via most probable analysis (as in DOP):

S NP she VP VP V NP PP P NP S NP VP V wanted NP NP PP NP P she the dress the rack

n

the dog the saw with telescope PP P NP the with telescope

Corpus Decompositie Recompositie

S NP VP V NP NP she she NP VP VP PP NP V saw VP V NP the dog saw NP the dog S NP she VP S NP she VP VP PP P NP the with telescope NP the dress

°

=

S NP she VP VP V NP PP P NP the the saw with telescope dress V NP saw NP the dress PP P NP P

n

S NP she VP VP PP P NP the with telescope V NP saw

etc.

slide-37

SLIDE 37

37

DOP & U-DOP have been used for:

Modeling language acquisition, surprisals, artificial language

learning:

Borensztaijn, Zuidema and Bod (2008) (Best Paper Award), Bod (2009),

Frank (2009), Ferdinand & Zuidema (2009), Cochran (2009) …

Applications: Language models for speech, Machine Translation,

Musical Parsing, Linguistic Parsing, Scientific Problem Solving: Bod (2007), Sangati & Zuidema (2009), Honingh & Bod (2006),

Honingh (2009), Bod et al. (2008)…

slide-38

SLIDE 38

38

How far does DOP stretch?

Problem solving with exemplar-based model such as DOP?
Exemplar-based reasoning has been proposed in Case-based

reasoning, Explanation-based learning, but without a probabilistic component

Problem-solutions in physics can be represented by derivation trees

slide-39

SLIDE 39

39

Example of derivation tree in classical mechanics

Derivation of planet's mass using Newton's laws

F = ma F = GMm/r2 a = v2/r F = mv2/r v = 2πr/P F = 4π2mr/P2 4π2mr/P2 = GMm/r2 M = 4π2r3/GP2

A tree describes the steps from higher-level laws to the solution (formula)

slide-40

SLIDE 40

40

Subtrees can be reused to solve new problems

F = ma a = v2/r F = mv2/r F = GMm/r2 F = ma a = v2/r F = mv2/r F = GMm/r2

=

mv2/r = GMm/r2 <=> F = ma a = v2/r F = mv2/r F = GMm/r2 mv2/r = GMm/r2 v = √(GM/r)

slide-41

SLIDE 41

41

A more complex derivation tree, from fluid dynamics:

ΣE = constant

ρgz1 + ρv12/2 + p1 = ρgz2 + ρv22/2 + p2 p1 = p2 v1 = 0 z1 − z2 = h v = √(2gh) Q(theoretical) = vA Q(actual) = CdQ(theoretical) Q(theoretical) = vdA dA = Bdh Q(theoretical) = vBdh Q(theoretical) = B√(2g) √hdh Q(theoretical) = (2/3) B√(2g) h3/2 Q(actual) = (2/3)

Cd B√(2g) h3/2

A similar DOP model for problem solving (enriched with equational reasoning)
But U-DOP can probably not be used for problem solving (but then: while children

all learn language from scratch, they do not learn physics from scratch)

slide-42

SLIDE 42

42

Some questions, by way of conclusion

What is the biological basis of a unified model that

understands new input by decomposing previous input?

What is shared in the mind, and what is special? E.g. can we

apply syntactic learning to periodicity (rhythmic) learning?

slide-43

SLIDE 43

43

Some questions, by way of conclusion

What is the biological basis of a unified model that

understands new input by decomposing previous input?

What is shared in the mind, and what is special? E.g. can we

apply syntactic learning to periodicity (rhythmic) learning? People involved in this VICI project: Rens Bod Carlette Jannink Federico Sangati Gideon Borensztajn Stefan Frank David Cochran Aline Honingh Yoav Seginer (till 2008) Vanessa Ferdinand Jelle Zuidema (till 2007,now Veni)

slide-44

SLIDE 44

44

Interested to learn more about this?

Then follow one of my second semester courses (electives):

“Cognitive Models of Language and Beyond”

(for linguists, logicians, cognitive scientists)

“Unsupervised Language Learning”

(for computational linguists, computer scientists, AI-ers)