A log-linear model of language acquisition with multiple cues - PowerPoint PPT Presentation

A log-linear model of language acquisition with multiple cues Gabriel Doyle Roger Levy UC San Diego Linguistics LSA 2011

mommyisntherenoweatyourapple

transition probabilities stress patterns X mommyisntherenoweatyourapple S W phonotactics allophonic variation coarticulation

no single sufficient cue Vowel Categorization Vallabha et al 2007, PNAS

Learning from Multiple Cues • Linguistic problems can have multiple partially informative cues • Need for models that learn to use cues jointly

The log-linear multi-cue model • General computational model for learning structures from multiple cues • Specific implementation in word segmentation using transition probabilities and stress patterns

Outline • The Multiple-Cue Problem • Case study: Word Segmentation • Log-linear multiple-cue model • Experimental testing

Case Study: Word Segmentation • Transition probabilities – p(B|A): probability that, having seen A, you’ll see B next Point to the monkey with the hat p(key|mon) = 1 p(hat|the) = 1/2 – Lower TP suggests separate words – 8 month old infants use TPs to segment artificial languages (Saffran et al 1996, a.o.)

Case Study: Word Segmentation • Stress patterns – English has trochaic (Strong-Weak) bias Dou ble, dou ble, toil and trou ble; Fi re burn and caul dron bu bble – 90% of content words start strong (Cutler & Carter 1987) – 7.5 month old English learners segment trochaic but not iambic words (Jusczyk et al 1999)

Existing segmentation models • Single cue-type (phonemes) – Bayesian MDL models (Goldwater et al 2009) – PUDDLE (Monaghan & Christiansen 2010) • Multi cue-type (phonemes & stress) – Connectionist (Christiansen et al 1998) – Algorithmic (Gambell & Yang 2006)

Why a log-linear model? • Ideal learner model; other multi-cue models aren’t • Effective in other linguistic tasks (Hayes & Wilson 2008, Poon et al 2009) • More flexible than other models – new cues become new features – overlapping cues are easy to incorporate

Log-linear modelling • Model learns a probability distribution Weighted sum of feature fns • Feature functions f j map (W,S) pairs to real numbers • “Learning” means finding good real number weights λ for features

Feature functions mommy ate it • Transition probabilities mmy|mo:1 – Bigram counts within words • Stress templates SW:1, S:2 – Stress “word” counts • Lexical – Word counts mommy:1, ate:1, it:1 • MDL Prior – Lexicon length length:10

“Normalizing” the probability Normalization constant • Probabilities need to be normalized • Usually divide by sum • But this sum is intractable

Contrastive estimation all possible corpora observed corpus . contrast set

Contrastive estimation (Smith & Eisner 2005) • Contrast set as focused negatives – Want to put probability mass on grammatical outcomes – AND remove mass from ungrammaticals • Good contrast sets can cause quicker convergence

Our contrast set • Set of all corpora from transposing two syllables in observed corpus Observed mommy ate it corpus mmymo ate it Note: not the only Ungrammatical possible contrast set contrasts mo ate mmy it “Grammatical” mommy it ate contrast

Learning the weights λ • Weights estimated using gradient ascent Expected feature value on observed corpus Prior Expected feature value on contrast set • Weight increases when feature appears in observed, decreases when it appears in contrast • Prior pulls weight toward initial bias µ i

Experimental Questions • Verification: Does it learn the stress biases that children exhibit? Training on child- directed English • Application: Can these biases explain age effects in word segmentation? Testing on artificial language

Thiessen & Saffran 2003 • Synthesized bisyllabic language, either all SW or all WS • 7 & 9 month olds, learning English • Preferential looking after exposure • Words & part words in opposition

Thiessen & Saffran 2003 SW Lang DApuDObiBUgoDApuBUgo 7 mos: dobi > bibu Both ages segment 9 mos: dobi > bibu by TPs & stress bias WS Lang daPUdoBIbuGOdaPUbuGO 7 mos: dobi > bibu 7 mos seg by TPs 9 mos: dobi < bibu 9 mos seg against TPs & with stress bias

Experimental Design • Train on English child-directed speech – 1638 words of Pearl-Brent database – 266 SW, 35 WS; 80% monosyllabic – Stress determined by CMU Pron Dict – Utterance & syllable boundaries included, non-utterance word boundaries not given – no prior knowledge given

Weights learned from child-directed English 0.4 0.35 0.3 Learned weight 0.25 0.2 λ WS λ SW 0.15 0.1 0.05 0 1 -0.05 -0.1 -0.15 Trochaic bias, SW > WS Mean λ SW – λ WS = .262 ± .119 [p < .001]

Age effects • Idea: older infants have stronger confidence in language parameters • Strength of learned priors increases to simulate increased linguistic experience prior strength prior value

Age effects 9 months 7 months 10 4.5 Word 9 4 Partword Looking time Word 8 Looking time 3.5 Partword 7 3 6 2.5 5 2 4 1.5 3 1 2 0.5 1 0 0 SW WS SW WS SW WS “Young” model “Old” model 0.35 0.05 0.3 0.25 Word 0.04 Word Word score Word score 0.2 Partword Partword 0.03 0.15 0.1 0.02 0.05 0.01 0 0 -0.05 -0.1 -0.01 -0.15 -0.02 -0.2 SW WS SW WS -0.03

Conclusions • Model learns stress bias from unsegmented data • Model shows similar behavioral change to infants learning a language • Behavioral change can result strictly from exposure, not a change in the segmentation method

Future Extensions • Expand set of cues (e.g., phonotactics) • Additional experimental applications • Move into other linguistic problems

Thank you! gdoyle@ling.ucsd.edu

A log-linear model of language acquisition with multiple cues - PowerPoint PPT Presentation

A log-linear model of language acquisition with multiple cues Gabriel Doyle Roger Levy UC San Diego Linguistics LSA 2011 mommyisntherenoweatyourapple transition probabilities stress patterns X mommyisntherenoweatyourapple S W

(142733/102960-Log[4])+(614851/73920-2 Log[64]) h 2 +(2329/1680-Log[4]) h 4 -h 10 /20160

Chandra data reduction The CDFs Giorgio, Margherita, Elisabeta, Eleonora, Lazarus, Enrica,

4 Language Models 2: Log-linear Language Models This chapter will discuss another set of language

First Language Acquisition: Inherent Difficulty of Language Acquisition Theories and Evidence

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

From Log-Linear to Neural Language Models Karl Stratos Rutgers University Karl Stratos CS 533:

Syslog and Log Rotate Computer Center, CS, NCTU Log files Execution information of each

Distributed ephemeral log service Log entries are replicated,dispersed See Ivy,

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

Section 3.7 Derivatives of logarithmic functions 1 Rules of exponentials and logarithms 1.

Log-Linear Models for History-Based Parsing Michael Collins, Columbia University Log-Linear

Linguistics 101 Language Acquisition Language Acquisition All (normal) human children...

Section5.4 Properties of Logarithmic Functions PropertiesofLogarithms Formulas Basic

STUDIES OF CLOSED/OPEN MIRROR SYMMETRY FOR QUINTIC THREE-FOLDS THROUGH LOG MIXED HODGE THEORY 0.

CS320: Performance Evaluation Plotting data sets Semi log plots Log log plots Analyzing Program

Complementary log-log and probit: activation functions implemented in artificial neural networks

King Alfred & The Anglo-Saxon Chronicle ENG240Y Old English / Fri 19 Nov 2010 Chronology

1: Old English Basics and Nominals Optional Tutorials Language/Seminar Tutorial Thursdays

CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis Hall Announcements Weekly

srgssgi osztlyok kapcsolata Gbl Gbor MSOTKE XV. Kongresszusa Sifok 2016. November 17.

The Writing Process: Creating Common Practices Teacher Institute, Friday May 3, 2013 Debra Polak

tr s r

Survey overview: Online survey in English, Spanish, Portuguese, Haitian Creole, French,

Surveillance, Censorship, and Countermeasures Professor Ristenpart

Sambuz

Useful Links

Newsletter

Mail Us

A log-linear model of language acquisition with multiple cues - PowerPoint PPT Presentation

A log-linear model of language acquisition with multiple cues Gabriel Doyle Roger Levy UC San Diego Linguistics LSA 2011 mommyisntherenoweatyourapple transition probabilities stress patterns X mommyisntherenoweatyourapple S W

(142733/102960-Log[4])+(614851/73920-2 Log[64]) h 2 +(2329/1680-Log[4]) h 4 -h 10 /20160

Chandra data reduction The CDFs Giorgio, Margherita, Elisabeta, Eleonora, Lazarus, Enrica,

4 Language Models 2: Log-linear Language Models This chapter will discuss another set of language

First Language Acquisition: Inherent Difficulty of Language Acquisition Theories and Evidence

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

From Log-Linear to Neural Language Models Karl Stratos Rutgers University Karl Stratos CS 533:

Syslog and Log Rotate Computer Center, CS, NCTU Log files Execution information of each

Distributed ephemeral log service Log entries are replicated,dispersed See Ivy,

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

Section 3.7 Derivatives of logarithmic functions 1 Rules of exponentials and logarithms 1.

Log-Linear Models for History-Based Parsing Michael Collins, Columbia University Log-Linear

Linguistics 101 Language Acquisition Language Acquisition All (normal) human children...

Section5.4 Properties of Logarithmic Functions PropertiesofLogarithms Formulas Basic

STUDIES OF CLOSED/OPEN MIRROR SYMMETRY FOR QUINTIC THREE-FOLDS THROUGH LOG MIXED HODGE THEORY 0.

CS320: Performance Evaluation Plotting data sets Semi log plots Log log plots Analyzing Program

Complementary log-log and probit: activation functions implemented in artificial neural networks

King Alfred &amp; The Anglo-Saxon Chronicle ENG240Y Old English / Fri 19 Nov 2010 Chronology

1: Old English Basics and Nominals Optional Tutorials Language/Seminar Tutorial Thursdays

CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis Hall Announcements Weekly

srgssgi osztlyok kapcsolata Gbl Gbor MSOTKE XV. Kongresszusa Sifok 2016. November 17.

The Writing Process: Creating Common Practices Teacher Institute, Friday May 3, 2013 Debra Polak

tr s r

Survey overview: Online survey in English, Spanish, Portuguese, Haitian Creole, French,

Surveillance, Censorship, and Countermeasures Professor Ristenpart

Sambuz

Useful Links

Newsletter

Mail Us

King Alfred & The Anglo-Saxon Chronicle ENG240Y Old English / Fri 19 Nov 2010 Chronology