Simulating Language Behavior An Introduction C a gr C oltekin - - PowerPoint PPT Presentation

simulating language behavior an introduction
SMART_READER_LITE
LIVE PREVIEW

Simulating Language Behavior An Introduction C a gr C oltekin - - PowerPoint PPT Presentation

Simulating Language Behavior An Introduction C a gr C oltekin c.coltekin@rug.nl Information science/Informatiekunde 2012-02-15 Tentative Plan Week Subject 1 Introduction & Organization 2 Computational simulation of


slide-1
SLIDE 1

Simulating Language Behavior An Introduction

C ¸a˘ grı C ¸¨

  • ltekin

c.coltekin@rug.nl

Information science/Informatiekunde

2012-02-15

slide-2
SLIDE 2

Tentative Plan

Week Subject 1 Introduction & Organization 2 Computational simulation of language acquisition (mostly segmentation) 3 Simulation of language change/diffusion Simulation of learning pronoun reference 4 Simulation of segmentation 5 Simulation of segmentation 6 Simulation of language comprehension Simulation of acquisition of words, morphology or syntax 7 Project presentations

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 1/34

slide-3
SLIDE 3

Outline

Language Behavior Modeling and Simulation Language Acquisition An example simulation: segmentation Summary & Discussion

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 1/34

slide-4
SLIDE 4

Language Behavior

What is language behavior?

We will be dealing with questions like:

◮ How does a particular aspect of language comprehension?

◮ Why some sentences are harder to comprehend than others?

◮ How do we acquire language(s)?

◮ Is there a difference between learning regular or irregular

aspects of language?

◮ How do languages change in time?

◮ What are the causes of language change, and in which ways do

we expect changes to occur?

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 2/34

slide-5
SLIDE 5

Language Behavior

What is language behavior?

We will be dealing with questions like:

◮ How does a particular aspect of language comprehension?

◮ Why some sentences are harder to comprehend than others?

◮ How do we acquire language(s)?

◮ Is there a difference between learning regular or irregular

aspects of language?

◮ How do languages change in time?

◮ What are the causes of language change, and in which ways do

we expect changes to occur?

Note that the questions are not only related to the directly

  • bservables. It relates to how human cognitive system works.

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 2/34

slide-6
SLIDE 6

Modeling and Simulation

What is a model?

Examples of some models in science:

◮ Galilean model of solar system. ◮ Bohr model of atom. ◮ Atmospheric models used in meteorology. ◮ Scale models of cars, bridges, buildings etc. used in

engineering.

◮ Animal models used in medicine.

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 3/34

slide-7
SLIDE 7

Modeling and Simulation

Models: why and how

◮ Why do we model things at all?

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 4/34

slide-8
SLIDE 8

Modeling and Simulation

Models: why and how

◮ Why do we model things at all?

◮ If the model matches the reality well, we can make predictions. ◮ We learn the phenomenon better while (formally) specifying

the model.

◮ Sometimes cannot study the object of interest directly.

Because it is too, expensive, unethical, or unpractical to do so.

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 4/34

slide-9
SLIDE 9

Modeling and Simulation

Models: why and how

◮ Why do we model things at all?

◮ If the model matches the reality well, we can make predictions. ◮ We learn the phenomenon better while (formally) specifying

the model.

◮ Sometimes cannot study the object of interest directly.

Because it is too, expensive, unethical, or unpractical to do so.

◮ Once we have the model, how do we get knowledge out of it?

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 4/34

slide-10
SLIDE 10

Modeling and Simulation

Models: why and how

◮ Why do we model things at all?

◮ If the model matches the reality well, we can make predictions. ◮ We learn the phenomenon better while (formally) specifying

the model.

◮ Sometimes cannot study the object of interest directly.

Because it is too, expensive, unethical, or unpractical to do so.

◮ Once we have the model, how do we get knowledge out of it?

◮ Study the model analytically. ◮ Run simulations. C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 4/34

slide-11
SLIDE 11

Modeling and Simulation

Models: why and how

◮ Why do we model things at all?

◮ If the model matches the reality well, we can make predictions. ◮ We learn the phenomenon better while (formally) specifying

the model.

◮ Sometimes cannot study the object of interest directly.

Because it is too, expensive, unethical, or unpractical to do so.

◮ Once we have the model, how do we get knowledge out of it?

◮ Study the model analytically. ◮ Run simulations.

All models are wrong, some are useful. — Box and Draper (1986, p. 424)

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 4/34

slide-12
SLIDE 12

Language Acquisition

slide-13
SLIDE 13

Language Acquisition

The problem of language acquisition

◮ Human languages are complex (recursion, ambiguity). ◮ Children do not receive explicit instruction during language

acquisition.

◮ Language acquisition by children is (arguably) fast and robust. ◮ The input to children is not enough for learning (Poverty of

Stimulus Argument).

◮ Children do not receive input critical for learning certain

phenomena.

◮ Human languages are not learnable from positive input (Gold,

1967). Negative input is not available to children.

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 5/34

slide-14
SLIDE 14

Language Acquisition

The debate

Nativism Our knowledge of language is largely determined at birth (by our genes). Contribution of environmental factors are only of secondary importance.

[...] in certain fundamental respects we do not really learn language; rather, grammar grows in the mind. (Chomsky, 1980, p.134) Plato, Descartes, Chomsky, . . .

Empiricism Our knowledge is primarily due to our interactions with the environment.

Aristotle, Locke, . . .

Taking one of these sides is common in linguistics.

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 6/34

slide-15
SLIDE 15

Language Acquisition

Debate resolved: we are all nativists

To say that “language is not innate” is to say that there is no difference between my granddaughter, a rock, and a

  • rabbit. In other words, if you take a rock, a rabbit, and

my granddaughter and put them in a community where people are talking English, they’ll all learn English. If people believe that, then they’ll believe language is not

  • innate. If they believe that there is a difference between

my granddaughter, a rabbit, and a rock, then they believe that language is innate. — Chomsky (2000, p.50), ‘The Architecture of Language’ ( emphasis mine.)

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 7/34

slide-16
SLIDE 16

Language Acquisition

Debate resolved: we are all empiricist

The obvious conclusion is that the real answer to the question, Where the knowledge come from, is that it comes from the interaction between nature and nurture,

  • r what has been called “epigenesis.” Genetic constraints

interact with internal and external environmental influences, and they jointly give rise to the phenotype. — Elman et al. (1996, pp.i–ii), ‘Rethinking Innateness’

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 8/34

slide-17
SLIDE 17

Language Acquisition

Debate in linguistics

We all agree that,

◮ Part of our linguistic abilities comes from our experience:

people are typically able to learn more different languages than they grow different physical organs.

◮ Part of our linguistic abilities are innate: rocks and rabbits

aside, even the species closest to us cannot match with our linguistic abilities.

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 9/34

slide-18
SLIDE 18

Language Acquisition

Debate in linguistics

We all agree that,

◮ Part of our linguistic abilities comes from our experience:

people are typically able to learn more different languages than they grow different physical organs.

◮ Part of our linguistic abilities are innate: rocks and rabbits

aside, even the species closest to us cannot match with our linguistic abilities. The disagreement seems to be on whether the innate component is language-specific knowledge or domain-general learning abilities.

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 9/34

slide-19
SLIDE 19

Language Acquisition

Now we know what it is, is the debate resolved?

Short answer:

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 10/34

slide-20
SLIDE 20

Language Acquisition

Now we know what it is, is the debate resolved?

Short answer: No.

◮ It is difficult to know the quantity/type of innate knowledge

necessary for settling the debate: The target seems to be moving: from P&P (Chomsky, 1981) to recursion (Hauser, Chomsky & Fitch, 2002) / merge (Berwick et al., 2011).

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 10/34

slide-21
SLIDE 21

Language Acquisition

Now we know what it is, is the debate resolved?

Short answer: No.

◮ It is difficult to know the quantity/type of innate knowledge

necessary for settling the debate: The target seems to be moving: from P&P (Chomsky, 1981) to recursion (Hauser, Chomsky & Fitch, 2002) / merge (Berwick et al., 2011).

◮ Empirical evidence is scarce, and interpreted differently.

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 10/34

slide-22
SLIDE 22

Language Acquisition

Now we know what it is, is the debate resolved?

Short answer: No.

◮ It is difficult to know the quantity/type of innate knowledge

necessary for settling the debate: The target seems to be moving: from P&P (Chomsky, 1981) to recursion (Hauser, Chomsky & Fitch, 2002) / merge (Berwick et al., 2011).

◮ Empirical evidence is scarce, and interpreted differently. ◮ ‘Logical arguments’ are either clearly false, or misunderstood

in the community at large.

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 10/34

slide-23
SLIDE 23

Language Acquisition

... but didn’t Gold (1967) prove it already?

◮ After Gold’s (1967), there have been many different results in

the field, which are typically ignored.

◮ Modeling is useful, but while interpreting results of models we

need to consider the match between the model and the real

  • world. In language learning case:

◮ Is the formal grammar a good candidate for the natural

grammar?

◮ Is learning method a plausible one? ◮ Is the characterization of the input match with the real-world

setting?

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 11/34

slide-24
SLIDE 24

Language Acquisition

... but didn’t Gold (1967) prove it already?

◮ After Gold’s (1967), there have been many different results in

the field, which are typically ignored.

◮ Modeling is useful, but while interpreting results of models we

need to consider the match between the model and the real

  • world. In language learning case:

◮ Is the formal grammar a good candidate for the natural

grammar?

◮ Is learning method a plausible one? ◮ Is the characterization of the input match with the real-world

setting?

Computational models are useful for investigating some arguments in the debate, but the results are unlikely to be conclusive.

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 11/34

slide-25
SLIDE 25

Language Acquisition

An informal game to understand Gold’s results

Try to guess what the sequence

  • f the given numbers are..

◮ 7, 11, 13, 17 ◮ 5, 7, 11, 13 ◮ 13, 17, 19, 23 ◮ ordered sequence of prime

numbers

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 12/34

slide-26
SLIDE 26

Language Acquisition

An informal game to understand Gold’s results

Try to guess what the sequence

  • f the given numbers are..

◮ 7, 11, 13, 17 ◮ 5, 7, 11, 13 ◮ 13, 17, 19, 23 ◮ ordered sequence of prime

numbers

◮ prime numbers

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 12/34

slide-27
SLIDE 27

Language Acquisition

An informal game to understand Gold’s results

Try to guess what the sequence

  • f the given numbers are..

◮ 7, 11, 13, 17 ◮ 5, 7, 11, 13 ◮ 13, 17, 19, 23 ◮ ordered sequence of prime

numbers

◮ prime numbers ◮ odd prime numbers

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 12/34

slide-28
SLIDE 28

Language Acquisition

An informal game to understand Gold’s results

Try to guess what the sequence

  • f the given numbers are..

◮ 7, 11, 13, 17 ◮ 5, 7, 11, 13 ◮ 13, 17, 19, 23 ◮ ordered sequence of prime

numbers

◮ prime numbers ◮ odd prime numbers ◮ just the list of randomly

chosen numbers given so far

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 12/34

slide-29
SLIDE 29

Language Acquisition

An informal game to understand Gold’s results

Try to guess what the sequence

  • f the given numbers are..

◮ 7, 11, 13, 17 ◮ 5, 7, 11, 13 ◮ 13, 17, 19, 23 ◮ ordered sequence of prime

numbers

◮ prime numbers ◮ odd prime numbers ◮ just the list of randomly

chosen numbers given so far

◮ . . .

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 12/34

slide-30
SLIDE 30

Language Acquisition

An informal game to understand Gold’s results

Try to guess what the sequence

  • f the given numbers are..

◮ 7, 11, 13, 17 ◮ 5, 7, 11, 13 ◮ 13, 17, 19, 23 ◮ ordered sequence of prime

numbers

◮ prime numbers ◮ odd prime numbers ◮ just the list of randomly

chosen numbers given so far

◮ . . .

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 12/34

slide-31
SLIDE 31

Language Acquisition

An informal game to understand Gold’s results

Try to guess what the sequence

  • f the given numbers are..

◮ 7, 11, 13, 17 ◮ 5, 7, 11, 13 ◮ 13, 17, 19, 23 ◮ ordered sequence of prime

numbers

◮ prime numbers ◮ odd prime numbers ◮ just the list of randomly

chosen numbers given so far

◮ . . .

We cannot know for certain, since there are infinite number of possible input sequences (input sentences), and infinite number of ways to characterize them (grammars).

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 12/34

slide-32
SLIDE 32

Language Acquisition

An informal game to understand Gold’s results

Try to guess what the sequence

  • f the given numbers are..

◮ 7, 11, 13, 17 ◮ 5, 7, 11, 13 ◮ 13, 17, 19, 23 ◮ ordered sequence of prime

numbers

◮ prime numbers ◮ odd prime numbers ◮ just the list of randomly

chosen numbers given so far

◮ . . .

We cannot know for certain, since there are infinite number of possible input sequences (input sentences), and infinite number of ways to characterize them (grammars). Natural languages are not provably unlearnable. Even if they were learnable, still we cannot arrive at an empiricist conclusion: the initial knowledge in these models are rather complex.

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 12/34

slide-33
SLIDE 33

Language Acquisition

... if we knew the language is

not innate we clearly could not solve ‘Plato’s problem’. We may have substantial innate knowledge in another domain

  • f cognition.

innate it would not necessarily solve the bigger debate

  • either. We know more clear genetically determined

factors affecting our cognition. Yet, these do not seem to have declared the victory for nativism.

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 13/34

slide-34
SLIDE 34

Language Acquisition

To sum up...

◮ The nature–nurture debate is intriguing, yet an unresolved

debate (which should eventually be resolved by research in neuroscience and genetics)

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 14/34

slide-35
SLIDE 35

Language Acquisition

To sum up...

◮ The nature–nurture debate is intriguing, yet an unresolved

debate (which should eventually be resolved by research in neuroscience and genetics)

◮ It has a central role in linguistics. I believe this role is not well

motivated:

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 14/34

slide-36
SLIDE 36

Language Acquisition

To sum up...

◮ The nature–nurture debate is intriguing, yet an unresolved

debate (which should eventually be resolved by research in neuroscience and genetics)

◮ It has a central role in linguistics. I believe this role is not well

motivated:

◮ Linguistics is just any other domain that may contribute to the

debate.

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 14/34

slide-37
SLIDE 37

Language Acquisition

To sum up...

◮ The nature–nurture debate is intriguing, yet an unresolved

debate (which should eventually be resolved by research in neuroscience and genetics)

◮ It has a central role in linguistics. I believe this role is not well

motivated:

◮ Linguistics is just any other domain that may contribute to the

debate.

◮ The contribution of the debate to the study of language is

uncertain in many cases.

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 14/34

slide-38
SLIDE 38

Language Acquisition

To sum up...

◮ The nature–nurture debate is intriguing, yet an unresolved

debate (which should eventually be resolved by research in neuroscience and genetics)

◮ It has a central role in linguistics. I believe this role is not well

motivated:

◮ Linguistics is just any other domain that may contribute to the

debate.

◮ The contribution of the debate to the study of language is

uncertain in many cases.

◮ More importantly, taking a priori sides in this unresolved

debate can be unfruitful and even misleading.

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 14/34

slide-39
SLIDE 39

An example: segmentation

slide-40
SLIDE 40

An example simulation: segmentation

Difficulties of learning segmentation

◮ No clear acouistic markers in fluent speech. ◮ Large speaker variation in acoustic input. ◮ Noise in the environmet. ◮ Children has to start with no knwledge of words. ◮ Even with a comprehensive knowledge of words, segmentation

is still difficult because of multiple plausible segmentations. For example: /6go/: /6go/ ‘ago’ or /6 go/ ‘a go’? /Itsnoz/: /Its noz/ ‘its nose’ or /It snoz/ ‘it snows’?

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 15/34

slide-41
SLIDE 41

An example simulation: segmentation

Recognize speech, or, wreck a nice beach

An automatic speech recognizers attempt to recognize the phrase ‘recognize speech’: r e k @ n ai s b ii ch her and I s be a aren’t ice bee an eye beach not nice an aren’t speech in ice speech wreck

  • n

reckon recognize

∗Example reproduced from: (Shillcock, 1995)

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 16/34

slide-42
SLIDE 42

An example simulation: segmentation

The puzzle to solve

ljuuzuibutsjhiuljuuz ljuuztbzjubhbjompwfljuuz xibutuibu ljuuz epzpvxbounpsfnjmlipofz ljuuzljuuzephhjf

  • pnjxibuepftbljuuztbz

xibuepftbljuuztbz ephhjfeph ephhjf

  • pnjxibuepftuifephhjftbz

xibuepftuifephhjftbz mjuumfcbczcjsejf cbczcjsejf zpvepoumjlfuibupof plbznpnnzublfuijtpvu dpx uifdpxtbztnppnpp xibuepftuifdpxtbzopnj

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 17/34

slide-43
SLIDE 43

An example simulation: segmentation

The puzzle to solve

ljuuzuibutsjhiuljuuz ljuuztbzjubhbjompwfljuuz xibutuibu ljuuz epzpvxbounpsfnjmlipofz ljuuzljuuzephhjf

  • pnjxibuepftbljuuztbz

xibuepftbljuuztbz ephhjfeph ephhjf

  • pnjxibuepftuifephhjftbz

xibuepftuifephhjftbz mjuumfcbczcjsejf cbczcjsejf zpvepoumjlfuibupof plbznpnnzublfuijtpvu dpx uifdpxtbztnppnpp xibuepftuifdpxtbzopnj

◮ No clear boundary markers ◮ No lexical knowledge

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 17/34

slide-44
SLIDE 44

An example simulation: segmentation

How do children segment?

Children very early in life (8-months) seem to be sensitive to statis- tical regularities between syllables (Saffran, Aslin, Newport 1996)

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 18/34

slide-45
SLIDE 45

An example simulation: segmentation

How do children segment?

Children very early in life (8-months) seem to be sensitive to statis- tical regularities between syllables (Saffran, Aslin, Newport 1996) Training: bidakupadotigolabubidakugolabupadoti...

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 18/34

slide-46
SLIDE 46

An example simulation: segmentation

How do children segment?

Children very early in life (8-months) seem to be sensitive to statis- tical regularities between syllables (Saffran, Aslin, Newport 1996) Training: bidakupadotigolabubidakugolabupadoti... TP(bi, da) = 1 TP(bu, pa) = 1

3

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 18/34

slide-47
SLIDE 47

An example simulation: segmentation

How do children segment?

Children very early in life (8-months) seem to be sensitive to statis- tical regularities between syllables (Saffran, Aslin, Newport 1996) Training: bidakupadotigolabubidakugolabupadoti... TP(bi, da) = 1 TP(bu, pa) = 1

3

t e s t G 1 : w

  • r

d s t e s t G 2 : n

  • n
  • w
  • r

d s padotibidakugolabupadoti... pagolabidotikugobdalaubu...

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 18/34

slide-48
SLIDE 48

An example simulation: segmentation

How do children segment?

Children very early in life (8-months) seem to be sensitive to statis- tical regularities between syllables (Saffran, Aslin, Newport 1996) Training: bidakupadotigolabubidakugolabupadoti... TP(bi, da) = 1 TP(bu, pa) = 1

3

t e s t G 1 : w

  • r

d s t e s t G 2 : n

  • n
  • w
  • r

d s padotibidakugolabupadoti... pagolabidotikugobdalaubu... Children showed preference towards the ‘words’ that are used in the training phase.

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 18/34

slide-49
SLIDE 49

An example simulation: segmentation

Predictability

Predictability within units is high, predictability between units is low.

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 19/34

slide-50
SLIDE 50

An example simulation: segmentation

Predictability

Predictability within units is high, predictability between units is low.

Given a sequence lr, where l and r are sequences of phonemes:

◮ If l help us predict r, lr is likely to be part of a word. ◮ If observing r after l is surprising it is likley that there is a

boundary between l and r.

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 19/34

slide-51
SLIDE 51

An example simulation: segmentation

Predictability

Predictability within units is high, predictability between units is low.

Given a sequence lr, where l and r are sequences of phonemes:

◮ If l help us predict r, lr is likely to be part of a word. ◮ If observing r after l is surprising it is likley that there is a

boundary between l and r. The strategy dates back to 1950s (Haris, 1955), where he used a measure called successor variety (SV): The morpheme boundaries are at the locations where there is a high variety of possible phonemes that follow the initial segment.

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 19/34

slide-52
SLIDE 52

An example simulation: segmentation

Back to the puzzle: some cues

ljuuzuibutsjhiuljuuz ljuuztbzjubhbjompwfljuuz xibutuibu ljuuz epzpvxbounpsfnjmlipofz ljuuzljuuzephhjf

  • pnjxibuepftbljuuztbz

xibuepftbljuuztbz ephhjfeph ephhjf

  • pnjxibuepftuifephhjftbz

xibuepftuifephhjftbz mjuumfcbczcjsejf cbczcjsejf zpvepoumjlfuibupof plbznpnnzublfuijtpvu dpx uifdpxtbztnppnpp xibuepftuifdpxtbzopnj

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 20/34

slide-53
SLIDE 53

An example simulation: segmentation

Back to the puzzle: some cues

ljuuzuibutsjhiuljuuz ljuuztbzjubhbjompwfljuuz xibutuibu ljuuz epzpvxbounpsfnjmlipofz ljuuzljuuzephhjf

  • pnjxibuepftbljuuztbz

xibuepftbljuuztbz ephhjfeph ephhjf

  • pnjxibuepftuifephhjftbz

xibuepftuifephhjftbz mjuumfcbczcjsejf cbczcjsejf zpvepoumjlfuibupof plbznpnnzublfuijtpvu dpx uifdpxtbztnppnpp xibuepftuifdpxtbzopnj

Cues for the solution:

◮ Acoustic cues, such as

pauses, stress, coarticulation, allophonic alternations, vowel harmony

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 20/34

slide-54
SLIDE 54

An example simulation: segmentation

Back to the puzzle: some cues

ljuuzuibutsjhiuljuuz ljuuztbzjubhbjompwfljuuz xibutuibu ljuuz epzpvxbounpsfnjmlipofz ljuuzljuuzephhjf

  • pnjxibuepftbljuuztbz

xibuepftbljuuztbz ephhjfeph ephhjf

  • pnjxibuepftuifephhjftbz

xibuepftuifephhjftbz mjuumfcbczcjsejf cbczcjsejf zpvepoumjlfuibupof plbznpnnzublfuijtpvu dpx uifdpxtbztnppnpp xibuepftuifdpxtbzopnj

Cues for the solution:

◮ Acoustic cues, such as

pauses, stress, coarticulation, allophonic alternations, vowel harmony

◮ lexical knowledge

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 20/34

slide-55
SLIDE 55

An example simulation: segmentation

Back to the puzzle: some cues

ljuuzuibutsjhiuljuuz ljuuztbzjubhbjompwfljuuz xibutuibu ljuuz epzpvxbounpsfnjmlipofz ljuuzljuuzephhjf

  • pnjxibuepftbljuuztbz

xibuepftbljuuztbz ephhjfeph ephhjf

  • pnjxibuepftuifephhjftbz

xibuepftuifephhjftbz mjuumfcbczcjsejf cbczcjsejf zpvepoumjlfuibupof plbznpnnzublfuijtpvu dpx uifdpxtbztnppnpp xibuepftuifdpxtbzopnj

Cues for the solution:

◮ Acoustic cues, such as

pauses, stress, coarticulation, allophonic alternations, vowel harmony

◮ lexical knowledge ◮ phonotactics

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 20/34

slide-56
SLIDE 56

An example simulation: segmentation

Back to the puzzle: some cues

ljuuzuibutsjhiuljuuz ljuuztbzjubhbjompwfljuuz xibutuibu ljuuz epzpvxbounpsfnjmlipofz ljuuzljuuzephhjf

  • pnjxibuepftbljuuztbz

xibuepftbljuuztbz ephhjfeph ephhjf

  • pnjxibuepftuifephhjftbz

xibuepftuifephhjftbz mjuumfcbczcjsejf cbczcjsejf zpvepoumjlfuibupof plbznpnnzublfuijtpvu dpx uifdpxtbztnppnpp xibuepftuifdpxtbzopnj

Cues for the solution:

◮ Acoustic cues, such as

pauses, stress, coarticulation, allophonic alternations, vowel harmony

◮ lexical knowledge ◮ phonotactics ◮ utterance boundaries

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 20/34

slide-57
SLIDE 57

An example simulation: segmentation

Back to the puzzle: some cues

ljuuzuibutsjhiuljuuz ljuuztbzjubhbjompwfljuuz xibutuibu ljuuz epzpvxbounpsfnjmlipofz ljuuzljuuzephhjf

  • pnjxibuepftbljuuztbz

xibuepftbljuuztbz ephhjfeph ephhjf

  • pnjxibuepftuifephhjftbz

xibuepftuifephhjftbz mjuumfcbczcjsejf cbczcjsejf zpvepoumjlfuibupof plbznpnnzublfuijtpvu dpx uifdpxtbztnppnpp xibuepftuifdpxtbzopnj

Cues for the solution:

◮ Acoustic cues, such as

pauses, stress, coarticulation, allophonic alternations, vowel harmony

◮ lexical knowledge ◮ phonotactics ◮ utterance boundaries ◮ distributional regularities

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 20/34

slide-58
SLIDE 58

An example simulation: segmentation

Back to the puzzle: some cues

ljuuzuibutsjhiuljuuz ljuuztbzjubhbjompwfljuuz xibutuibu ljuuz epzpvxbounpsfnjmlipofz ljuuzljuuzephhjf

  • pnjxibuepftbljuuztbz

xibuepftbljuuztbz ephhjfeph ephhjf

  • pnjxibuepftuifephhjftbz

xibuepftuifephhjftbz mjuumfcbczcjsejf cbczcjsejf zpvepoumjlfuibupof plbznpnnzublfuijtpvu dpx uifdpxtbztnppnpp xibuepftuifdpxtbzopnj

Cues for the solution:

◮ Acoustic cues, such as

pauses, stress, coarticulation, allophonic alternations, vowel harmony

◮ lexical knowledge ◮ phonotactics ◮ utterance boundaries ◮ distributional regularities ◮ predictability

TP(ju) = 11/27 = 0.40 TP(zu) = 2/23 = 0.08

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 20/34

slide-59
SLIDE 59

An example simulation: segmentation

Measures of (un)predictability

◮ Transitional probability

TP(l, r) = P(lr) P(l)

◮ Pointwise mutual

information MI(l, r) = log2 P(lr) P(l)P(r)

◮ Successor value

SV(l) =

  • r∈A

c(l, r)

◮ Boundary entropy

H(l) = −

  • r∈A

P(r|l) log2P (r|l) The assymmetric measures have their ‘reverse’ counterparts. The length of the sequences l and r matters.

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 21/34

slide-60
SLIDE 60

An example simulation: segmentation

How to Calculate the Measures

# I z D & t 6 k I t i #

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 22/34

slide-61
SLIDE 61

An example simulation: segmentation

How to Calculate the Measures

# I z D & t 6 k I t i #

TP:

TP(#I, z) = P(z|#I) = 0.40

0.40

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 22/34

slide-62
SLIDE 62

An example simulation: segmentation

How to Calculate the Measures

# I z D & t 6 k I t i #

TP: 0.40

TP(Iz, D) = P(D|Iz) = 0.22

0.22

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 22/34

slide-63
SLIDE 63

An example simulation: segmentation

How to Calculate the Measures

# I z D & t 6 k I t i #

TP: 0.40 0.22

TP(zD, &) = P(&|zD) = 0.46

0.46

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 22/34

slide-64
SLIDE 64

An example simulation: segmentation

How to Calculate the Measures

# I z D & t 6 k I t i #

TP: 0.40 0.22 0.46

TP(D&, t) = P(t|D&) = 0.99

0.99

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 22/34

slide-65
SLIDE 65

An example simulation: segmentation

How to Calculate the Measures

# I z D & t 6 k I t i #

TP: 0.40 0.22 0.46 0.99

TP(&t, 6) = P(6|&t) = 0.03

0.03

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 22/34

slide-66
SLIDE 66

An example simulation: segmentation

How to Calculate the Measures

# I z D & t 6 k I t i #

TP: 0.40 0.22 0.46 0.99 0.03

TP(t6, k) = P(k|t6) = 0.04

0.04

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 22/34

slide-67
SLIDE 67

An example simulation: segmentation

How to Calculate the Measures

# I z D & t 6 k I t i #

TP: 0.40 0.22 0.46 0.99 0.03 0.04

TP(6k, I) = P(I|6k) = 0.30

0.30

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 22/34

slide-68
SLIDE 68

An example simulation: segmentation

How to Calculate the Measures

# I z D & t 6 k I t i #

TP: 0.40 0.22 0.46 0.99 0.03 0.04 0.30

TP(kI, t) = P(t|kI) = 0.48

0.48

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 22/34

slide-69
SLIDE 69

An example simulation: segmentation

How to Calculate the Measures

# I z D & t 6 k I t i #

TP: 0.40 0.22 0.46 0.99 0.03 0.04 0.30 0.48

TP(It, i) = P(i|It) = 0.10

0.10

Calculations are done on a corpus of child-directed English

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 22/34

slide-70
SLIDE 70

An example simulation: segmentation

Transitional Probability

boudary 2 4 6 word internal 0.0 0.2 0.4 0.6 0.8 1.0 2 4 6

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 23/34

slide-71
SLIDE 71

An example simulation: segmentation

Pointwise Mutual Information

boudary 0.0 0.1 0.2 0.3 0.4 word internal

  • 5

5 10 0.0 0.1 0.2 0.3 0.4

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 24/34

slide-72
SLIDE 72

An example simulation: segmentation

Successor Variety

boudary 0.00 0.06 0.12 word internal 10 20 30 40 0.00 0.06 0.12

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 25/34

slide-73
SLIDE 73

An example simulation: segmentation

Entropy

boudary 0.0 0.4 0.8 word internal 1 2 3 4 0.0 0.4 0.8

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 26/34

slide-74
SLIDE 74

An example simulation: segmentation

An unsupervised method

◮ An obvious way to segment the sequence is using a threshold

  • value. However, the choice of threshold is difficult in an

unsupervised system.

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 27/34

slide-75
SLIDE 75

An example simulation: segmentation

An unsupervised method

◮ An obvious way to segment the sequence is using a threshold

  • value. However, the choice of threshold is difficult in an

unsupervised system. A simple unsupervised method: segment at peaks/valleys. I z D & t 6 k I t i

4.0 2.0 0.0

MI H

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 27/34

slide-76
SLIDE 76

An example simulation: segmentation

Combining multiple measures: a simple method

Majority voting (algorithm) works if

  • 1. Votes are cast (relatively) independently.
  • 2. Decisions of the voters are better than random.

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 28/34

slide-77
SLIDE 77

An example simulation: segmentation

Combining multiple measures: a simple method

Majority voting (algorithm) works if

  • 1. Votes are cast (relatively) independently.
  • 2. Decisions of the voters are better than random.

I z D & t 6 k I t i

4.0 2.0 0.0

MI H

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 28/34

slide-78
SLIDE 78

An example simulation: segmentation

Combining multiple measures: a simple method

Majority voting (algorithm) works if

  • 1. Votes are cast (relatively) independently.
  • 2. Decisions of the voters are better than random.

I z D & t 6 k I t i

4.0 2.0 0.0

Votes: 2 1 1 2

MI H

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 28/34

slide-79
SLIDE 79

An example simulation: segmentation

Putting it all together: a simple algorithm

foreach utterance do

1

foreach phoneme position in the utterance do

2

Get the majority vote of all measures calculated using

3

context sizes one to four; if majority vote is positive then

4

insert a boundary;

5

  • utput the segmented utterance ;

6

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 29/34

slide-80
SLIDE 80

An example simulation: segmentation

Evaluation: boundary, word and lexicon scores

We use standard evaluation scores precision, recall and f-score for evaluation. In case of segmentation these values can be calculated over,

◮ boundaries (BP, BR, BF), ◮ word tokens (WP, WR, WF), ◮ word types or the lexicon, (LP, LR, LF).

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 30/34

slide-81
SLIDE 81

An example simulation: segmentation

The results

method BP BR BF WP WR WF LP LR LF Random 27.4 27.4 27.4 12.7 12.7 12.7 6.4 46.4 11.3 Predictability 92.7 76.0 83.5 77.2 67.4 72.0 28.4 65.1 39.5 Baseline 84.2 82.7 83.5 72.1 71.2 71.6 50.7 61.1 55.4

Results are obtained using Algorithm 1 on phonemic transcriptions

  • f child directed speech from Berstain-Ratner corpus.

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 31/34

slide-82
SLIDE 82

An example simulation: segmentation

Segmentation puzzle: a solution

ljuuz uibut sjhiu ljuuz ljuuz tbz ju bhbjo mpwf ljuuz xibut uibu ljuuz ep zpv xbou npsf njml ipofz ljuuz ljuuz ephhjf

  • pnj xibu epft b ljuuz tbz

xibu epft b ljuuz tbz ephhjf eph ephhjf

  • pnj xibu epft uif ephhjf tbz

xibu epft uif ephhjf tbz mjuumf cbcz cjsejf cbcz cjsejf zpv epou mjlf uibu pof plbz npnnz ublf uijt pvu dpx uif dpx tbzt npp npp xibu epft uif dpx tbz opnj

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 32/34

slide-83
SLIDE 83

An example simulation: segmentation

Segmentation puzzle: a solution

ljuuz uibut sjhiu ljuuz ljuuz tbz ju bhbjo mpwf ljuuz xibut uibu ljuuz ep zpv xbou npsf njml ipofz ljuuz ljuuz ephhjf

  • pnj xibu epft b ljuuz tbz

xibu epft b ljuuz tbz ephhjf eph ephhjf

  • pnj xibu epft uif ephhjf tbz

xibu epft uif ephhjf tbz mjuumf cbcz cjsejf cbcz cjsejf zpv epou mjlf uibu pof plbz npnnz ublf uijt pvu dpx uif dpx tbzt npp npp xibu epft uif dpx tbz opnj ljuuz uibu tsjhiuljuuz ljuuz tbz jubhbjompwfljuuz xibu tuibu ljuuz ep zpvxbounpsfnjmli pof z ljuuz ljuuz ephhjf

  • pnj xibu ep ftb ljuuz tbz

xibu ep ftb ljuuz tbz ephhjf eph ephhjf

  • pnj xibu epft uif ephhjf tbz

xibu ep ft uif ephhjf tbz mjuumfcbczcjsejf cbczcjsejf zpv epoumj lf uibu pof plbznpnnzublfui jtpvu dpx uif dpx tbz tnppnpp xibu epft uif dpx tbz opnj

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 32/34

slide-84
SLIDE 84

An example simulation: segmentation

Segmentation puzzle: a solution

kitty thats right kitty kitty say it again love kitty whats that kitty do you want more milk honey kitty kitty doggie nomi what does a kitty say what does a kitty say doggie dog doggie nomi what does the doggie say what does the doggie say little baby birdie baby birdie you dont like that one

  • kay mommy take this out

cow the cow says moo moo what does the cow say nomi kitty that srightkitty kitty say itagainlovekitty what sthat kitty do youwantmoremilkh one y kitty kitty doggie nomi what do esa kitty say what do esa kitty say doggie dog doggie nomi what does the doggie say what do es the doggie say littlebabybirdie babybirdie you dontli ke that one

  • kaymommytaketh isout

cow the cow say smoomoo what does the cow say nomi

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 32/34

slide-85
SLIDE 85

An example simulation: segmentation

Summary

The segmentation procedure we have just reviewed

◮ is in line with the psycholinguistic research, ◮ is completely unsupervised, ◮ is incremental, ◮ performs competitive with an alternative state of the art

segmentation method. This is only the part of the solution, we can

◮ use information from utterance boundaries, ◮ keep an explicit lexicon and use it for further segmentation, ◮ make use of acoustic cues, ◮ use a better algorithm for boundary guessing.

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 33/34

slide-86
SLIDE 86

Wrapping up

slide-87
SLIDE 87

Summary & Discussion

Simulation, human behaviour and nature

◮ Does the simulation study help us understand and predict

(interesting) human behavior?

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 34/34

slide-88
SLIDE 88

Summary & Discussion

Simulation, human behaviour and nature

◮ Does the simulation study help us understand and predict

(interesting) human behavior?

◮ Would it contribute to nature–nurture debate?

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 34/34

slide-89
SLIDE 89

Summary & Discussion

Simulation, human behaviour and nature

◮ Does the simulation study help us understand and predict

(interesting) human behavior?

◮ Would it contribute to nature–nurture debate? ◮ If we know one of the positions in the debate is correct, would

it help us create a better model?

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 34/34

slide-90
SLIDE 90

Summary & Discussion

Simulation, human behaviour and nature

◮ Does the simulation study help us understand and predict

(interesting) human behavior?

◮ Would it contribute to nature–nurture debate? ◮ If we know one of the positions in the debate is correct, would

it help us create a better model?

◮ Clearly we assume some initial knowledge, e.g., phonemes.

But, these could be learned in an earlier stage.

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 34/34

slide-91
SLIDE 91

Summary & Discussion

Simulation, human behaviour and nature

◮ Does the simulation study help us understand and predict

(interesting) human behavior?

◮ Would it contribute to nature–nurture debate? ◮ If we know one of the positions in the debate is correct, would

it help us create a better model?

◮ Clearly we assume some initial knowledge, e.g., phonemes.

But, these could be learned in an earlier stage.

◮ If I knew for certain that phonemes are innate, it could have

helped.

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 34/34

slide-92
SLIDE 92

Summary & Discussion

Simulation, human behaviour and nature

◮ Does the simulation study help us understand and predict

(interesting) human behavior?

◮ Would it contribute to nature–nurture debate? ◮ If we know one of the positions in the debate is correct, would

it help us create a better model?

◮ Clearly we assume some initial knowledge, e.g., phonemes.

But, these could be learned in an earlier stage.

◮ If I knew for certain that phonemes are innate, it could have

helped.

◮ If I knew for certain that phonemes weren’t innate, it may

motivate me to study how it is learned.

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 34/34

slide-93
SLIDE 93

Summary & Discussion

Simulation, human behaviour and nature

◮ Does the simulation study help us understand and predict

(interesting) human behavior?

◮ Would it contribute to nature–nurture debate? ◮ If we know one of the positions in the debate is correct, would

it help us create a better model?

◮ Clearly we assume some initial knowledge, e.g., phonemes.

But, these could be learned in an earlier stage.

◮ If I knew for certain that phonemes are innate, it could have

helped.

◮ If I knew for certain that phonemes weren’t innate, it may

motivate me to study how it is learned.

◮ But does it matter if this knowledge is language specific or

not?

C ¸. C ¸¨

  • ltekin, Informatiekunde

Simulating Language Behavior 34/34