Base-driven alternation in Tgdaya Seediq AFLA 27 Jennifer Kuo - - PowerPoint PPT Presentation

base driven alternation in tgdaya seediq
SMART_READER_LITE
LIVE PREVIEW

Base-driven alternation in Tgdaya Seediq AFLA 27 Jennifer Kuo - - PowerPoint PPT Presentation

Base-driven alternation in Tgdaya Seediq AFLA 27 Jennifer Kuo University of California, Los Angeles based ofg a single surface allomorph . Background Quantitative Patterns Modeling Discussion Overview Tgdaya Seediq, where verb paradigms


slide-1
SLIDE 1

Base-driven alternation in Tgdaya Seediq

AFLA 27

Jennifer Kuo

University of California, Los Angeles

slide-2
SLIDE 2

Background Quantitative Patterns Modeling Discussion

Overview

Tgdaya Seediq, where verb paradigms show extensive alternations, is a good test case for comparing between theories of morphophonology. Evidence from Tgdaya Seediq supports an approach where URs are based ofg a single surface allomorph.

AFLA 27 1/39

slide-3
SLIDE 3

Background Quantitative Patterns Modeling Discussion

Overview

Tgdaya Seediq, where verb paradigms show extensive alternations, is a good test case for comparing between theories of morphophonology. Evidence from Tgdaya Seediq supports an approach where URs are based ofg a single surface allomorph.

AFLA 27 1/39

slide-4
SLIDE 4

Background Quantitative Patterns Modeling Discussion

Two approaches to morphophonological analysis

‘Cobbled’ URs (Kenstowicz and Kisseberth, 1977)

  • URs preserve as many contrastive properties as possible.
  • When all forms of a paradigm are afgected by neutralizing

processes, the resulting UR must ‘cobble’ information from multiple forms of the paradigm.

Example: Tonkawa (Kenstowicz and Kisseberth, 1977, p.16) A B (/C-stem-V/) (/V-stem-C/) gloss UR notx ntoxo ‘hoe’ /notoxo/ netl ntale ‘lick’ /netale/

AFLA 27 2/39

slide-5
SLIDE 5

Background Quantitative Patterns Modeling Discussion

Two approaches to morphophonological analysis

‘Cobbled’ URs (Kenstowicz and Kisseberth, 1977) Result: surface forms are predictable, and derivable from exceptionless rules/constraints.

Example: Tonkawa (Kenstowicz and Kisseberth, 1977, p.16) (/C-stem-V/) (/V-stem-C/) gloss UR netl ntale ‘lick’ /netale/ /C-netale-V/ Rule C-netle-V Delete V2 C-netl-V Delete V3 [C-netl-V]

AFLA 27 3/39

slide-6
SLIDE 6

Background Quantitative Patterns Modeling Discussion

Two approaches to morphophonological analysis

Single surface base hypothesis (Albright, 2002b, et seq.)

  • learners designate one slot ( surface allomorph) in the paradigm

to be a ‘privileged base’.

  • input for morphophonology.

Example: Slot B is chosen as base A B notx ntoxo ‘hoe’ netl ntale ‘lick’ Deriving slot A of paradigm Rules notoxo [o]/__Co

AFLA 27 4/39

slide-7
SLIDE 7

Background Quantitative Patterns Modeling Discussion

Two approaches to morphophonological analysis

Single surface base hypothesis (Albright, 2002b, et seq.)

  • learners designate one slot ( surface allomorph) in the paradigm

to be a ‘privileged base’.

  • input for morphophonology.

Example: Slot B is chosen as base A B notx ntoxo ‘hoe’ netl ntale ‘lick’ Deriving slot A of paradigm [ntoxo] Rules notoxo ∅ →[o]/__Co [notoxo]

AFLA 27 4/39

slide-8
SLIDE 8

Background Quantitative Patterns Modeling Discussion

Two approaches to morphophonological analysis

Single surface base hypothesis (Albright, 2002b, et seq.)

  • If all forms of a paradigm have undergone some neutralization,

no base will work perfectly.

  • rules/constraints will have exceptions.
  • However, growing body of evidence from:
  • Historical change; e.g. Yiddish, Lakhota (Albright, 2010, 2002a)
  • Child learning errors; e.g. Korean (Kang, 2006)

AFLA 27 5/39

slide-9
SLIDE 9

Background Quantitative Patterns Modeling Discussion

Two approaches to morphophonological analysis

Single surface base hypothesis (Albright, 2002b, et seq.)

  • If all forms of a paradigm have undergone some neutralization,

no base will work perfectly.

  • rules/constraints will have exceptions.
  • However, growing body of evidence from:
  • Historical change; e.g. Yiddish, Lakhota (Albright, 2010, 2002a)
  • Child learning errors; e.g. Korean (Kang, 2006)

AFLA 27 5/39

slide-10
SLIDE 10

Background Quantitative Patterns Modeling Discussion

Overview: Tgdaya Seediq

  • Seediq is an Atayalic language,

spoken in Taiwan.

  • Tgdaya Seediq (德固達雅):
  • spoken primarily in Nantou
  • population

2500 (Tsukida, 2005)

  • Number of fluent speakers is thought

to be much fewer.

All forms of a verb paradigm sufger from loss of contrasts good test case for comparing the two theories

  • f morphophonology.

AFLA 27 6/39

slide-11
SLIDE 11

Background Quantitative Patterns Modeling Discussion

Overview: Tgdaya Seediq

  • Seediq is an Atayalic language,

spoken in Taiwan.

  • Tgdaya Seediq (德固達雅):
  • spoken primarily in Nantou
  • population ∼2500 (Tsukida, 2005)
  • Number of fluent speakers is thought

to be much fewer.

All forms of a verb paradigm sufger from loss of contrasts good test case for comparing the two theories

  • f morphophonology.

AFLA 27 6/39

slide-12
SLIDE 12

Background Quantitative Patterns Modeling Discussion

Overview: Tgdaya Seediq

  • Seediq is an Atayalic language,

spoken in Taiwan.

  • Tgdaya Seediq (德固達雅):
  • spoken primarily in Nantou
  • population ∼2500 (Tsukida, 2005)
  • Number of fluent speakers is thought

to be much fewer.

All forms of a verb paradigm sufger from loss of contrasts ⇒ good test case for comparing the two theories

  • f morphophonology.

AFLA 27 6/39

slide-13
SLIDE 13

Background Quantitative Patterns Modeling Discussion

Overview: Seediq morphophonology (Yang, 1976)

Verb inflection (Holmer, 1996).

actor foc

  • loc. foc
  • pat. foc
  • instr. foc

pres <m>/mu-

  • an
  • un

s-

pret

<mun> <n>-an <un> fut mu(pu)- RED-an RED-un imp

  • i

Significant alternations between suffjxed and non-suffjxed forms of verb paradigms Examples will compare bare stem vs. /-an/-suffjxed forms

AFLA 27 7/39

slide-14
SLIDE 14

Background Quantitative Patterns Modeling Discussion

Overview: Seediq morphophonology (Yang, 1976)

Verb inflection (Holmer, 1996).

actor foc

  • loc. foc
  • pat. foc
  • instr. foc

pres <m>/mu-

  • an
  • un

s-

pret

<mun> <n>-an <un> fut mu(pu)- RED-an RED-un imp

  • i

Significant alternations between suffjxed and non-suffjxed forms of verb paradigms Examples will compare bare stem vs. /-an/-suffjxed forms

AFLA 27 7/39

slide-15
SLIDE 15

Background Quantitative Patterns Modeling Discussion

Sources of morphophonological alternation

Pretonic vowel reduction (VR) Post-tonic VR Consonant neutralization Word-final monophthongization

AFLA 27 8/39

slide-16
SLIDE 16

Background Quantitative Patterns Modeling Discussion

Sources of morphophonological alternation

Pretonic vowel reduction (VR) Post-tonic VR Consonant neutralization Word-final monophthongization

AFLA 27 8/39

slide-17
SLIDE 17

Background Quantitative Patterns Modeling Discussion

Pretonic vowel reduction (VR)

Five vowels /a i e o u/ Stress is penultimate; suffjxation shiħts stress rightwards. Pretonically, all vowel contrasts are neutralized...

  • 1. Onsetless vowels delete (36/36)

a. ‘lead (by a leash) b. ‘come’

  • 2. Assimilation to stressed vowel if separated by
  • r

(35/35) a. ‘hide (an object)’ b. ‘sew’

  • 3. Vowel reduction to [u] (201/201)

a. ‘tell someone’ b. ‘rare’ c. ‘decrease’

Loss of contrasts in suffjxed forms.

AFLA 27 9/39

slide-18
SLIDE 18

Background Quantitative Patterns Modeling Discussion

Pretonic vowel reduction (VR)

Five vowels /a i e o u/ Stress is penultimate; suffjxation shiħts stress rightwards. Pretonically, all vowel contrasts are neutralized...

  • 1. Onsetless vowels delete (36/36)

a. "awak ∼ "wak-an ‘lead (by a leash) b. "eyah ∼ "yah-an ‘come’

  • 2. Assimilation to stressed vowel if separated by
  • r

(35/35) a. ‘hide (an object)’ b. ‘sew’

  • 3. Vowel reduction to [u] (201/201)

a. ‘tell someone’ b. ‘rare’ c. ‘decrease’

Loss of contrasts in suffjxed forms.

AFLA 27 9/39

slide-19
SLIDE 19

Background Quantitative Patterns Modeling Discussion

Pretonic vowel reduction (VR)

Five vowels /a i e o u/ Stress is penultimate; suffjxation shiħts stress rightwards. Pretonically, all vowel contrasts are neutralized...

  • 1. Onsetless vowels delete (36/36)

a. "awak ∼ "wak-an ‘lead (by a leash) b. "eyah ∼ "yah-an ‘come’

  • 2. Assimilation to stressed vowel if separated by

[P] or [h] (35/35) a. "lePiN ∼ li"PiN-an ‘hide (an object)’ b. "saPis ∼ si"Pis-an ‘sew’

  • 3. Vowel reduction to [u] (201/201)

a. ‘tell someone’ b. ‘rare’ c. ‘decrease’

Loss of contrasts in suffjxed forms.

AFLA 27 9/39

slide-20
SLIDE 20

Background Quantitative Patterns Modeling Discussion

Pretonic vowel reduction (VR)

Five vowels /a i e o u/ Stress is penultimate; suffjxation shiħts stress rightwards. Pretonically, all vowel contrasts are neutralized...

  • 1. Onsetless vowels delete (36/36)

a. "awak ∼ "wak-an ‘lead (by a leash) b. "eyah ∼ "yah-an ‘come’

  • 2. Assimilation to stressed vowel if separated by

[P] or [h] (35/35) a. "lePiN ∼ li"PiN-an ‘hide (an object)’ b. "saPis ∼ si"Pis-an ‘sew’

  • 3. Vowel reduction to [u] (201/201)

a. "kesa ∼ ku"sa-an ‘tell someone’ b. "barah ∼ bu"rah-an ‘rare’ c. "bi> tsiq ∼ bu"> tsiq-an ‘decrease’

Loss of contrasts in suffjxed forms.

AFLA 27 9/39

slide-21
SLIDE 21

Background Quantitative Patterns Modeling Discussion

Pretonic vowel reduction (VR)

Five vowels /a i e o u/ Stress is penultimate; suffjxation shiħts stress rightwards. Pretonically, all vowel contrasts are neutralized...

  • 1. Onsetless vowels delete (36/36)

a. "awak ∼ "wak-an ‘lead (by a leash) b. "eyah ∼ "yah-an ‘come’

  • 2. Assimilation to stressed vowel if separated by

[P] or [h] (35/35) a. "lePiN ∼ li"PiN-an ‘hide (an object)’ b. "saPis ∼ si"Pis-an ‘sew’

  • 3. Vowel reduction to [u] (201/201)

a. "kesa ∼ ku"sa-an ‘tell someone’ b. "barah ∼ bu"rah-an ‘rare’ c. "bi> tsiq ∼ bu"> tsiq-an ‘decrease’

⇒ Loss of contrasts in suffjxed forms.

AFLA 27 9/39

slide-22
SLIDE 22

Background Quantitative Patterns Modeling Discussion

Stress-driven vowel alternations

  • Post-tonically...
  • 1. /e,o/ reduce to [u] in closed syllables

a. "remux ∼ ru"muxan ‘enter’ (u∼u, n=60) b. "pemux ∼ pu"mexan ‘hold’ (u∼e, n=36) c. "doPus ∼ do"Pos-an ‘refine’ (metal)’ (u∼o, n=3)

loss of contrasts in (non-suffjxed) stem forms

AFLA 27 10/39

slide-23
SLIDE 23

Background Quantitative Patterns Modeling Discussion

Stress-driven vowel alternations

  • Post-tonically...
  • 1. /e,o/ reduce to [u] in closed syllables

a. "remux ∼ ru"muxan ‘enter’ (u∼u, n=60) b. "pemux ∼ pu"mexan ‘hold’ (u∼e, n=36) c. "doPus ∼ do"Pos-an ‘refine’ (metal)’ (u∼o, n=3)

⇒ loss of contrasts in (non-suffjxed) stem forms

AFLA 27 10/39

slide-24
SLIDE 24

Background Quantitative Patterns Modeling Discussion

Final consonant neutralization

  • Various processes of final consonant neutralization, a subset of

which are shown here:

  • 1. /p, b, k/ → [k]

alternation stem suffixed (a) [k∼k] (n=19) "tatak tu"tak-an ‘chop’ (b) [k∼p] (n=6) "patak pu"tap-an ‘cut’ (c) [k∼b] (n=1) "eluk "leb-an ‘close’

  • 2. />

ts, t, d/→ [> ts] (a) [> ts∼> ts] (n=1) bu"> tseba> ts bucu"ba> ts-an ‘slice’ (b) [> ts∼t] (n=16) "dama> ts du"mat-an ‘for eating’ (c) [> ts∼d] (n=4) "hara> ts hu"rad-an ‘build (a wall)’

AFLA 27 11/39

slide-25
SLIDE 25

Background Quantitative Patterns Modeling Discussion

Final cons. neutralization, continued

  • Continued...
  • 3. /N,m/ → [N]

(a) [N∼N] (n=32) "gilaN gu"laN-an ‘mill (rice)’ (b) [N∼m] (n=3) "talaN tu"lam-an ‘run’

  • 4. /n,l/ → [n]

(a) [n∼n] (n=3) "durun du"run-an ‘entrust’ (b) [n∼l] (n=19) "dudun du"dul-an ‘lead’ (alternations involving stem-final [g] are more complicated, and not discussed here)

loss of constrasts in stem forms.

AFLA 27 12/39

slide-26
SLIDE 26

Background Quantitative Patterns Modeling Discussion

Final cons. neutralization, continued

  • Continued...
  • 3. /N,m/ → [N]

(a) [N∼N] (n=32) "gilaN gu"laN-an ‘mill (rice)’ (b) [N∼m] (n=3) "talaN tu"lam-an ‘run’

  • 4. /n,l/ → [n]

(a) [n∼n] (n=3) "durun du"run-an ‘entrust’ (b) [n∼l] (n=19) "dudun du"dul-an ‘lead’ (alternations involving stem-final [g] are more complicated, and not discussed here)

⇒ loss of constrasts in stem forms.

AFLA 27 12/39

slide-27
SLIDE 27

Background Quantitative Patterns Modeling Discussion

Morphophonological learning in Seediq

All forms of a Seediq verbal paradigm sufger from some form of neutralization; some verbs undergo extensive alternations e.g. "geruN ∼ gu"reman ‘to break’ "eluk ∼ "leban ‘to close’ Cobbled UR approach SR UR Single surface base approach SR Base

  • r

AFLA 27 13/39

slide-28
SLIDE 28

Background Quantitative Patterns Modeling Discussion

Morphophonological learning in Seediq

All forms of a Seediq verbal paradigm sufger from some form of neutralization; some verbs undergo extensive alternations e.g. "geruN ∼ gu"reman ‘to break’ "eluk ∼ "leban ‘to close’ Cobbled UR approach SR UR /gerem/ ["geruN] [gu"reman] Single surface base approach SR Base

  • r

AFLA 27 13/39

slide-29
SLIDE 29

Background Quantitative Patterns Modeling Discussion

Morphophonological learning in Seediq

All forms of a Seediq verbal paradigm sufger from some form of neutralization; some verbs undergo extensive alternations e.g. "geruN ∼ gu"reman ‘to break’ "eluk ∼ "leban ‘to close’ Cobbled UR approach SR UR /gerem/ ["geruN] [gu"reman] Single surface base approach SR Base ["geruN]

  • r

[gu"reman] [gu"reman] ["geruN]

AFLA 27 13/39

slide-30
SLIDE 30

Background Quantitative Patterns Modeling Discussion

Morphophonological learning in Seediq

All forms of a Seediq verbal paradigm sufger from some form of neutralization; some verbs undergo extensive alternations e.g. "geruN ∼ gu"reman ‘to break’ "eluk ∼ "leban ‘to close’ Cobbled UR approach SR UR /gerem/ ["geruN] [gu"reman] Single surface base approach SR Base ["geruN]

  • r

[gu"reman] [gu"reman] ["geruN]

AFLA 27 13/39

slide-31
SLIDE 31

Background Quantitative Patterns Modeling Discussion

Comparing the two approaches

When the learner has incomplete data, what kind of renalysis/errors will take place?

  • Cobbled UR: the UR will be determined by whatever surface forms

happen to be available. reanalyses in both directions are plausible.

  • Surface base: Reanalyses will always be projected from the

designated base (i.e. same slot in paradigm). resulting Seediq lexicon will have asymmetries in paradigm structure.

AFLA 27 14/39

slide-32
SLIDE 32

Background Quantitative Patterns Modeling Discussion

Comparing the two approaches

When the learner has incomplete data, what kind of renalysis/errors will take place?

  • Cobbled UR: the UR will be determined by whatever surface forms

happen to be available. ⇒reanalyses in both directions are plausible.

  • Surface base: Reanalyses will always be projected from the

designated base (i.e. same slot in paradigm). resulting Seediq lexicon will have asymmetries in paradigm structure.

AFLA 27 14/39

slide-33
SLIDE 33

Background Quantitative Patterns Modeling Discussion

Comparing the two approaches

When the learner has incomplete data, what kind of renalysis/errors will take place?

  • Cobbled UR: the UR will be determined by whatever surface forms

happen to be available. ⇒reanalyses in both directions are plausible.

  • Surface base: Reanalyses will always be projected from the

designated base (i.e. same slot in paradigm). ⇒resulting Seediq lexicon will have asymmetries in paradigm structure.

AFLA 27 14/39

slide-34
SLIDE 34

Background Quantitative Patterns Modeling Discussion

Quantitative Patterns

  • Suffjxed forms are highly predictable from stems, but not vice

versa (i.e. stem forms are more informative)

  • Suggests that Seediq speakers have identified the isolation stem

as the base, per Albright’s surface-base hypothesis.

AFLA 27 15/39

slide-35
SLIDE 35

Background Quantitative Patterns Modeling Discussion

Data collection

Results are based on a corpus of 340 verbal paradigms

  • Taiwan Aboriginal e-Dictionary (n=184) (Mei-jin et al., 2014)
  • fieldwork with three Seediq speakers (n=156)

2F,1M; ages 69-78

AFLA 27 16/39

slide-36
SLIDE 36

Background Quantitative Patterns Modeling Discussion

Predictability from stems

Sources of contrast neutralization in stems/non-suffjxed forms:

  • Post-tonic vowel reduction
  • Final consonant neutralization
  • Final monophthongization

Can these neutralizations be ‘undone’ in a principled way, based on statistical patterns of predictability?

AFLA 27 17/39

slide-37
SLIDE 37

Background Quantitative Patterns Modeling Discussion

Predictability from stems

Sources of contrast neutralization in stems/non-suffjxed forms:

  • Post-tonic vowel reduction
  • Final consonant neutralization
  • Final monophthongization

Can these neutralizations be ‘undone’ in a principled way, based on statistical patterns of predictability?

AFLA 27 17/39

slide-38
SLIDE 38

Background Quantitative Patterns Modeling Discussion

Predictability from stems

Sources of contrast neutralization in stems/non-suffjxed forms:

  • Post-tonic vowel reduction
  • Final consonant neutralization
  • Final monophthongization

Can these neutralizations be ‘undone’ in a principled way, based on statistical patterns of predictability?

AFLA 27 17/39

slide-39
SLIDE 39

Background Quantitative Patterns Modeling Discussion

Predictability from stems: post-tonic vowel alternations

  • Recall that due to post-tonic vowel reduction...

stem suffixed CVCuC ∼ {CuCeCan, CuCoCan, CuCuCan}

  • But, identity of vowel is predictable via vowel matching

if potus then putosan petus putesan p{u,a,i}tus putusan a speaker can predict, with relatively high accuracy, what a post-tonic [u] will surface as in suffjxed forms.

AFLA 27 18/39

slide-40
SLIDE 40

Background Quantitative Patterns Modeling Discussion

Predictability from stems: post-tonic vowel alternations

  • Recall that due to post-tonic vowel reduction...

stem suffixed CVCuC ∼ {CuCeCan, CuCoCan, CuCuCan}

  • But, identity of vowel is predictable via vowel matching

if potus then putosan petus putesan p{u,a,i}tus putusan ⇒a speaker can predict, with relatively high accuracy, what a post-tonic [u] will surface as in suffjxed forms.

AFLA 27 18/39

slide-41
SLIDE 41

Background Quantitative Patterns Modeling Discussion

Predictability from stems: post-tonic vowel alternations

Figure 1: How reduced [u] of non-suffjxed CVCuC is realised when stressed under suffjxation

For example....

  • ["putus] always surfaces

as [pu"tusan] (∼28/28=100%)

  • ["petus] likely surfaces as

[pu"tesan] (∼32/40=80%) Note: [o] appears to be marginal in the lexicon.

AFLA 27 19/39

slide-42
SLIDE 42

Background Quantitative Patterns Modeling Discussion

Predictability from stems: post-tonic vowel alternations

Figure 1: How reduced [u] of non-suffjxed CVCuC is realised when stressed under suffjxation

For example....

  • ["putus] always surfaces

as [pu"tusan] (∼28/28=100%)

  • ["petus] likely surfaces as

[pu"tesan] (∼32/40=80%) Note: [o] appears to be marginal in the lexicon.

AFLA 27 19/39

slide-43
SLIDE 43

Background Quantitative Patterns Modeling Discussion

Predictability from stems: final consonant alternations

  • Due to final consonant neutralization, final [>

ts, k, n, N] show the following alternations stem suffixed stem suffixed [> ts] ∼ [t, d, > ts] [n] ∼ [l, n] [k] ∼ [p, b, k] [N] ∼ [m, N]

  • Final consonants tend to almost always or almost never alternate
  • Given a novel stem, (non-)alternation is relatively predictable.

AFLA 27 20/39

slide-44
SLIDE 44

Background Quantitative Patterns Modeling Discussion

Predictability from stems: final consonant alternations

  • Due to final consonant neutralization, final [>

ts, k, n, N] show the following alternations stem suffixed stem suffixed [> ts] ∼ [t, d, > ts] [n] ∼ [l, n] [k] ∼ [p, b, k] [N] ∼ [m, N]

  • Final consonants tend to almost always or almost never alternate
  • Given a novel stem, (non-)alternation is relatively predictable.

AFLA 27 20/39

slide-45
SLIDE 45

Background Quantitative Patterns Modeling Discussion

Predictability from stems: final consonant alternations

For example...

["patiN]→[pu"tiNan] (32/35, 91%) ["pati> ts]→[pu"titan] (16/21, 76%)

AFLA 27 21/39

slide-46
SLIDE 46

Background Quantitative Patterns Modeling Discussion

Predictability from stems: summary

  • Given a non-suffjxed stem, it is impossible to perfectly predict the

alternation of (i) [u] in post-tonic closed syllables, (ii) stem-final vowels and consonants ([> ts, n, k, N, g]).

  • However, these alternations are highly predictable from just the

stem form due to statistical regularities.

  • How about the other direction; will suffjxed forms be a good

base?

AFLA 27 22/39

slide-47
SLIDE 47

Background Quantitative Patterns Modeling Discussion

Predictability from stems: summary

  • Given a non-suffjxed stem, it is impossible to perfectly predict the

alternation of (i) [u] in post-tonic closed syllables, (ii) stem-final vowels and consonants ([> ts, n, k, N, g]).

  • However, these alternations are highly predictable from just the

stem form due to statistical regularities.

  • How about the other direction; will suffjxed forms be a good

base?

AFLA 27 22/39

slide-48
SLIDE 48

Background Quantitative Patterns Modeling Discussion

Predictability from stems: summary

  • Given a non-suffjxed stem, it is impossible to perfectly predict the

alternation of (i) [u] in post-tonic closed syllables, (ii) stem-final vowels and consonants ([> ts, n, k, N, g]).

  • However, these alternations are highly predictable from just the

stem form due to statistical regularities.

  • How about the other direction; will suffjxed forms be a good

base?

AFLA 27 22/39

slide-49
SLIDE 49

Background Quantitative Patterns Modeling Discussion

Predictability from suffjxed forms

Given the suffjxed form of a verb...

  • Final consonants and vowels are completely predictable.
  • However, the antipenultimate vowel of the stem is always

neutralized due to pretonic VR { } Compared to the neutralizing processes discussed so far, the patterns of predictability that would allow speakers to ’undo’ pretonic VR are much weaker

AFLA 27 23/39

slide-50
SLIDE 50

Background Quantitative Patterns Modeling Discussion

Predictability from suffjxed forms

Given the suffjxed form of a verb...

  • Final consonants and vowels are completely predictable.
  • However, the antipenultimate vowel of the stem is always

neutralized due to pretonic VR [pu"tim-an] → {"patiN, "pitiN, "petiN, "potiN,"putiN} Compared to the neutralizing processes discussed so far, the patterns of predictability that would allow speakers to ’undo’ pretonic VR are much weaker

AFLA 27 23/39

slide-51
SLIDE 51

Background Quantitative Patterns Modeling Discussion

Predictability from suffjxed forms

Given the suffjxed form of a verb...

  • Final consonants and vowels are completely predictable.
  • However, the antipenultimate vowel of the stem is always

neutralized due to pretonic VR [pu"tim-an] → {"patiN, "pitiN, "petiN, "potiN,"putiN} Compared to the neutralizing processes discussed so far, the patterns of predictability that would allow speakers to ’undo’ pretonic VR are much weaker

AFLA 27 23/39

slide-52
SLIDE 52

Background Quantitative Patterns Modeling Discussion

Predictability from suffjxed forms

Figure 2: Distribution of stressed vowels in non-monosyllabic suffjxed forms

AFLA 27 24/39

slide-53
SLIDE 53

Background Quantitative Patterns Modeling Discussion

Predictability from suffjxed forms

Figure 3: Distribution of stressed vowels in non-monosyllabic suffjxed forms

For example...

  • Given the form

[pu"tasan], the most likely stem form is ["patas]. However, this is correct only 38% of the time (44/115)

AFLA 27 25/39

slide-54
SLIDE 54

Background Quantitative Patterns Modeling Discussion

Predictability from suffjxed forms, cont.

  • To undo pretonic VR, even picking the ’most likely’ option based
  • n statistical distributions would only result in correct

predictions for 181/316 relevant forms (49%).

  • pretonic VR also afgects more forms than the processes which

cause loss of contrasts in the stem (336/340).

AFLA 27 26/39

slide-55
SLIDE 55

Background Quantitative Patterns Modeling Discussion

Predictability from suffjxed forms, cont.

  • To undo pretonic VR, even picking the ’most likely’ option based
  • n statistical distributions would only result in correct

predictions for 181/316 relevant forms (49%).

  • pretonic VR also afgects more forms than the processes which

cause loss of contrasts in the stem (336/340).

AFLA 27 26/39

slide-56
SLIDE 56

Background Quantitative Patterns Modeling Discussion

Interim summary

Asymmetry in informativeness of stem vs. suffjxed forms, where stem forms are much more informative. How does this asymmetry support the single surface base hypothesis?

AFLA 27 27/39

slide-57
SLIDE 57

Background Quantitative Patterns Modeling Discussion

Statistical asymmetries as evidence for base-driven reanalysis

select one cell in paradigm as base verb paradigms whose other cells are poorly pre- dicted by the base are gradually lev- eled ‘base’ will become more informative than the other cells if one cell in a paradigm is much more informative than the

  • thers, and this asymmetry cannot be attributed just to phonological

neutralization processes, restructuring from a single base form has likely happened.

AFLA 27 28/39

slide-58
SLIDE 58

Background Quantitative Patterns Modeling Discussion

Statistical asymmetries as evidence for base-driven reanalysis

select one cell in paradigm as base verb paradigms whose other cells are poorly pre- dicted by the base are gradually lev- eled ‘base’ will become more informative than the other cells ⇒if one cell in a paradigm is much more informative than the

  • thers, and this asymmetry cannot be attributed just to phonological

neutralization processes, restructuring from a single base form has likely happened.

AFLA 27 28/39

slide-59
SLIDE 59

Background Quantitative Patterns Modeling Discussion

Modeling

  • Rule-based model confirms the stem/suffjx asymmetry.
  • Evaluation of models against a simulated lexicon provides more

indirect evidence for base-driven restructuring of paradigms.

AFLA 27 29/39

slide-60
SLIDE 60

Background Quantitative Patterns Modeling Discussion

Model implementation

  • Takes surface forms as input, and attempts to derive the other

slots of the paradigm using phonological rules.

  • based ofg of Minimal Generalization Learner (Albright and Hayes,

2003).

  • Explicit algorithm for quantifying the informativeness of bases.
  • Each stem-suffjx pair in the lexicon is assigned a score, reflecting

how well the model predicts the output.

AFLA 27 30/39

slide-61
SLIDE 61

Background Quantitative Patterns Modeling Discussion

Model implementation

  • Takes surface forms as input, and attempts to derive the other

slots of the paradigm using phonological rules.

  • based ofg of Minimal Generalization Learner (Albright and Hayes,

2003).

  • Explicit algorithm for quantifying the informativeness of bases.
  • Each stem-suffjx pair in the lexicon is assigned a score, reflecting

how well the model predicts the output.

AFLA 27 30/39

slide-62
SLIDE 62

Background Quantitative Patterns Modeling Discussion

Model evaluation

Compare how two models, stem-base vs. suffjx-base, perform on the Seediq corpus. The model (stem- vs. suffjx-base) which assigns higher scores...

  • better captures the lexicon.
  • has the more informative base.

AFLA 27 31/39

slide-63
SLIDE 63

Background Quantitative Patterns Modeling Discussion

Stem vs. suffjx base model

Figure 4: Performance of stem vs. suffjx-base models

AFLA 27 32/39

slide-64
SLIDE 64

Background Quantitative Patterns Modeling Discussion

Indirect evidence for historical reanalysis

  • Model results confirm stem-suffjx asymmetry
  • The stem form is a good base in part because neutralised

segments either almost always or almost never alternate.

  • Notably, this could be either due to
  • historical reanalysis exaggerating patterns of predictability
  • accidental efgect of baseline phonotactic preferences

e.g. final [ ] strongly prefers to alternate with [t]; this may be because there’s a strong baseline phonotactic preference for [t] (relative to [ ]).

AFLA 27 33/39

slide-65
SLIDE 65

Background Quantitative Patterns Modeling Discussion

Indirect evidence for historical reanalysis

  • Model results confirm stem-suffjx asymmetry
  • The stem form is a good base in part because neutralised

segments either almost always or almost never alternate.

  • Notably, this could be either due to
  • historical reanalysis exaggerating patterns of predictability
  • accidental efgect of baseline phonotactic preferences

e.g. final [> ts] strongly prefers to alternate with [t]; this may be because there’s a strong baseline phonotactic preference for [t] (relative to [> ts]).

AFLA 27 33/39

slide-66
SLIDE 66

Background Quantitative Patterns Modeling Discussion

Indirect evidence for historical reanalysis

To account for this, test the two surface-base models were tested against a simulated lexicon

  • rates of alternation are determined by relative frequencies of

sounds in the Seediq lexicon.

  • If the stem-base model performs equally well on the real and

simulated data, then stem-suffjx asymmetry can be attributed to phonotactic preferences.

AFLA 27 34/39

slide-67
SLIDE 67

Background Quantitative Patterns Modeling Discussion

Indirect evidence for historical reanalysis

Figure 5: Model performance using real vs. simulated lexicon

AFLA 27 35/39

slide-68
SLIDE 68

Background Quantitative Patterns Modeling Discussion

Discussion

AFLA 27 36/39

slide-69
SLIDE 69

Background Quantitative Patterns Modeling Discussion

Conclusion

Non-suffjxed forms of a Seediq paradigm are much more informative than the suffjxed forms. Modeling results suggest that this asymmetry cannot be explained by baseline phonotactic preferences. These results are ...

  • puzzling under the cobbled UR approach, which makes no

predictions about the direction of restructuring

  • Expected under the single surface base approach, where

restructuring from a base exaggerates asymmetries in the data.

AFLA 27 37/39

slide-70
SLIDE 70

Background Quantitative Patterns Modeling Discussion

Conclusion

Non-suffjxed forms of a Seediq paradigm are much more informative than the suffjxed forms. Modeling results suggest that this asymmetry cannot be explained by baseline phonotactic preferences. These results are ...

  • puzzling under the cobbled UR approach, which makes no

predictions about the direction of restructuring

  • Expected under the single surface base approach, where

restructuring from a base exaggerates asymmetries in the data.

AFLA 27 37/39

slide-71
SLIDE 71

Background Quantitative Patterns Modeling Discussion

Conclusion

Non-suffjxed forms of a Seediq paradigm are much more informative than the suffjxed forms. Modeling results suggest that this asymmetry cannot be explained by baseline phonotactic preferences. These results are ...

  • puzzling under the cobbled UR approach, which makes no

predictions about the direction of restructuring

  • Expected under the single surface base approach, where

restructuring from a base exaggerates asymmetries in the data.

AFLA 27 37/39

slide-72
SLIDE 72

Background Quantitative Patterns Modeling Discussion

Further testing

What other sources of evidence could there be for base-driven alternations in Seediq?

  • Extensive historical evidence
  • Productivity testing, to see if speakers apply (or don’t apply)

alternations as predicted by the surface base hypothesis. ⇒Work in progress

AFLA 27 38/39

slide-73
SLIDE 73

.

Thank you!

First, thank you to my three Seediq, consultants, 黃美玉, 陳玉妹, 謝芸薇, for their time and invaluable knowledge. Many thanks to Bruce Hayes, Kie Zuraw, and Claire Moore-Cantwell guidance on all aspects of this project. Thanks also to the UCLA Phonology seminar for much helpful discussion. AFLA 27 39/39

slide-74
SLIDE 74

References i

Adam Albright. A restricted model of ur discovery: Evidence from lakhota. Ms, University of California at Santa Cruz, 2002a. Adam Albright. Base-driven leveling in yiddish verb paradigms. Natural Language & Linguistic Theory, 28(3):475–537, 2010. Adam Albright and Bruce Hayes. Rules vs. analogy in english past tenses: A computational/experimental study. Cognition, 90(2):119–161, 2003. Adam C Albright. The identification of bases in morphological paradigms. PhD thesis, University of California, Los Angeles, 2002b. Simon J Greenhill, Robert Blust, and Russell D Gray. The austronesian basic vocabulary database: from bioinformatics to lexomics. Evolutionary Bioinformatics, 4:271–283, 2008.

AFLA 27

slide-75
SLIDE 75

References ii

Arthur Holmer. A parametric grammar of Seediq. PhD thesis, Lund University, 1996. Yoonjung Kang. Neutralizations and variations in korean verbal paradigms. Harvard Studies in Korean Linguistics, 11:183–196, 2006. Michael Kenstowicz and Larry M Kisseberth. Topics in phonological theory, 1977. Paul Jen-kui Li. Reconstruction of proto-atayalic phonology. Bulletin of the Institute of History and Philology, 52, 1981. Huang Mei-jin, Yu-yang Liu, and Xin-sheng Wu. Taiwan aboriginal language e-dictionary. Taiwan Journal of Indigenous Studies, 7(2):73–118, 2014. Andrei Mikheev. Automatic rule induction for unknown-word guessing. Computational Linguistics, 23(3):405–423, 1997.

AFLA 27

slide-76
SLIDE 76

References iii

Naomi Tsukida. Seediq. the austronesian languages of asia and madagascar,

  • ed. by alexander adelaar and nikolaus p. himmelmann, 291–325, 2005.

Hsiu-fang Yang. The phonological structure of the paran dialect of sediq. Bulletin of the Institute of History and Philology Academia Sinica, 47(4): 611–706, 1976.

AFLA 27

slide-77
SLIDE 77

Irregular alternations i

  • 1. Irregular vowel alternations (n=11)

stem suffixed gloss expected suffixed "huruc hu"ridan ‘come to a stop’ (hu"rudan, hu"redan) "tebas tu"besan ‘sieve grains’ (tu"basan)

  • 2. Irregular final vowel deletion (n=5)

"hado "hadan ‘deliver’ (hu"dawan) "qene "qenan ‘extend’ (qu"neyan)

  • 3. Non-alternating pairs (n=2)

"> tsaman "> tsaman ‘pass the night’ (> tsu"man-an, > tsu"malan)

  • 4. [n]-insertion (n=3)

"qeya qu"yan-an ‘hang’ (qu"ya-an)

AFLA 27

slide-78
SLIDE 78

Comparing the two approaches

Cobbled UR approach ✓Phonotactically motivated markedness constraints or rules, which are (nearly) exceptionless. ✓Empirical predictions about range

  • f possible alternations.

✗UR learning relatively diffjcult. Surface-base approach ✗Some alternations can’t be explained by general markedness; many exceptions ✓Evidence from historical change and child speech errors (e.g. Kang, 2006; Albright, 2010). ✓UR learning relatively easy.

AFLA 27

slide-79
SLIDE 79

Base-driven restructuring: a Seediq example

Statistical patterns in the modern Seediq lexicon reflect a strong dispreference for the stem-final [N]-[m] alternation.

  • Older system of Seediq with relatively more symmetrical

distribution of segments

  • Dispreference for alternation → weaker statistical tendency.
  • paradigms which showed the dispreferred

alternation would gradually have been restructured, resulting in the very skewed rates of alternation that we see today. One example (elicited) suggesting this type of reanalysis: ( *l-um-aum) ‘to burn’ (Li, 1981; Greenhill et al., 2008)

AFLA 27

slide-80
SLIDE 80

Base-driven restructuring: a Seediq example

Statistical patterns in the modern Seediq lexicon reflect a strong dispreference for the stem-final [N]-[m] alternation.

  • Older system of Seediq with relatively more symmetrical

distribution of segments

  • Dispreference for alternation → weaker statistical tendency.
  • paradigms which showed the dispreferred [N]-[m] alternation

would gradually have been restructured, resulting in the very skewed rates of alternation that we see today. One example (elicited) suggesting this type of reanalysis: ( *l-um-aum) ‘to burn’ (Li, 1981; Greenhill et al., 2008)

AFLA 27

slide-81
SLIDE 81

Base-driven restructuring: a Seediq example

Statistical patterns in the modern Seediq lexicon reflect a strong dispreference for the stem-final [N]-[m] alternation.

  • Older system of Seediq with relatively more symmetrical

distribution of segments

  • Dispreference for alternation → weaker statistical tendency.
  • paradigms which showed the dispreferred [N]-[m] alternation

would gradually have been restructured, resulting in the very skewed rates of alternation that we see today. One example (elicited) suggesting this type of reanalysis: "lauN∼lu"uNan (<*l-um-aum) ‘to burn’ (Li, 1981; Greenhill et al., 2008)

AFLA 27

slide-82
SLIDE 82

Model evaluation

Examples of rules in the stem-base model

Name Rule Example p (H/S) ˆ p (a)

  • Pret. VR

[ +syl

  • stress

] → [u] / #C "patuk→pu"tukan 1.0 (265/265) 0.99 (b)

  • Pret. V-del.

[ +syl

  • stress

] → ∅ / # "awak→"wakan 1.0 (36/36) 0.95 (c) N-to-m [N] → [m] / ]stemV "geruN→gu"reman 0.06 (2/34) 0.02 (d) ruy-to-rig [ruy] → [rig] / ]stemV "baruy→bu"rigan 1.0 (3/3) 0.6

  • Rules vary in scope (number of input forms that meet structural

description) and hits (forms where application results in correct output).

  • Confidence (p) is Hits/Scope.
  • Based on Mikheev (1997), rules are evaluated on adjusted confidence (ˆ

p), i.e. penalized for less evidence (AKA low scope).

AFLA 27

slide-83
SLIDE 83

Model evaluation

Each rule has a confidence value, reflecting how accurate it is. Model assigns a score to each stem/suffjx pair in the input data:

  • Score: product of confidence of all the rules needed to derive the

correct output form. The model (stem- vs. suffjx-base) which assigns higher scores...

  • better captures the lexicon.
  • has the more informative base.

AFLA 27

slide-84
SLIDE 84

Simulated lexicon

To account for this, test the two surface-base models were tested against a simulated lexicon

  • 700 verb paradigms
  • rates of alternation are determined by relative frequencies of

sounds in the Seediq lexicon (regardless of which position in a word they occur in)

  • e.g. across the corpus of 340 paradigms, [N] (n=104) is around 2.1

times more frequent than [m] (n=49). Corresponding to this, the [N]-final forms in the simulated lexicon are 2.1 times more likely to not alternate (than to alternate with [m]).

AFLA 27

slide-85
SLIDE 85

Selection of non-suffjxed form as base

Why was the non-suffjxed form, rather than the suffjxed form, designated as the base form?

  • Albright (2002b): the base should be the “most informative”, that

(i) has the fewest lexical items afgected by neutralization, and (ii) sufgers from the fewest neutralizations

AFLA 27

slide-86
SLIDE 86

Selection of non-suffjxed form as base

(i) neutralizing processes afgect the fewest lexical items

  • True: 336/340 suffjxed forms are afgected by pretonic VR, while

287/340 non-suffjxed forms are afgected by post-tonic VR and/or

  • ther final neutralization processes.

AFLA 27

slide-87
SLIDE 87

Selection of non-suffjxed form as base

(ii) sufgers from the fewest phonological neutralization processes

  • Not intuitively true; non-suffjxed forms are afgected by more

neutralizing processes (post-tonic VR and final consonant neutralization).

  • Historical evidence suggests that pre-tonic VR occurred prior to

all of the post-tonic neutralization processes (Li, 1981, 239).

  • It is likely that at some point aħter pretonic neutralization, the

non-suffjxed forms of the Seediq verb paradigm had become much more informative than the suffjxed forms.

  • ‘tipping point’ for restructuring of paradigms.

AFLA 27

slide-88
SLIDE 88

Productivity of base-driven alternations

Results predict that speakers will be able to productively apply statistically preferred alternations when given novel stem forms. novel stem expected suffix form "petus pu"tesan (vowel matching) "patac pu"tatan ([> ts]-[t] alternation) "pataN pu"taNan (no [N]-[m] alternation) Is this the case?

  • tentative support from pilot ’paradigm-gap’ tests.
  • Current work in progress: more extensive testing.

AFLA 27

slide-89
SLIDE 89

Productivity of base-driven alternations

Results predict that speakers will be able to productively apply statistically preferred alternations when given novel stem forms. novel stem expected suffix form "petus pu"tesan (vowel matching) "patac pu"tatan ([> ts]-[t] alternation) "pataN pu"taNan (no [N]-[m] alternation) Is this the case?

  • tentative support from pilot ’paradigm-gap’ tests.
  • Current work in progress: more extensive testing.

AFLA 27