Class 7: Learning and learnability
Adam Albright (albright@mit.edu)
LSA 2017 Phonology University of Kentucky
Class 7: Learning and learnability Adam Albright (albright@mit.edu) - - PowerPoint PPT Presentation
Class 7: Learning and learnability Adam Albright (albright@mit.edu) LSA 2017 Phonology University of Kentucky Announcements For those taking this class for credit Option 1: assignment 2 comments are posted, assignment 3 due next Monday
LSA 2017 Phonology University of Kentucky
▶ Option 1: assignment 2 comments are posted, assignment 3 due
▶ Option 2: short paper/squib due next Monday 7/31
▶ Questions? ▶ T
▶ Next Monday: phonological typology
Learning in OT RCD The subset problem The GLA and HG References 1/55
Learning in OT RCD The subset problem The GLA and HG References 2/55
Adjective Emphatic Gloss a. hade hande ‘showy’ b.
‘terrible’ c. jowai joɴwai ‘weak’ d. hajai haɴjai ‘fast’ e. kaɾai kaɴɾai ‘spicy’ f. nagai naŋgai ‘long’ g. kanaʃiː kanːaʃiː ‘sad’ h. amai amːai ‘sweet’ i. katai katːai ‘hard’ j.
‘slow’ k. takai takːai ‘high’ l. atsui atːsui ‘hot’ Adjective Emphatic Gloss m. kitanai kitːanai ‘dirty’ n. kusai kusːai ‘stinky’
ikai ikːai ‘big’ p. zonzai zoːnzai ‘impolite’ q. kandaɾui kaːndaɾui ‘languid’ r.
‘ugly’ s. supːai suːpːai ‘sour’ t.
‘scary’ u.
‘delicious’ v. kiːɾoi kiːɴɾoi ‘yellow’ w. toːtoi toːtːoi respectable
Learning in OT RCD The subset problem The GLA and HG References 3/55
▶ *non-intervocalic Cː, *bad geminate outrank all of the above ▶ That is, don’t create bad sequences merely in order to employ a
Learning in OT RCD The subset problem The GLA and HG References 4/55
▶ M: Evaluate phonological form of output (incl. hidden structure) ▶ F: Or relation between input and output forms
▶ Augments underlying forms with additional structure
▶ Modifies structures of input in various ways to generate competing
▶ Phonological strings, in same featural representation as output
▶ Possibly additional structure, not present in output?
Learning in OT RCD The subset problem The GLA and HG References 5/55
▶ This is an unachievable goal, in the general case!
▶ P(attested forms) > 0 ▶ P(unattested forms) = 0
▶ P(acceptable forms) > 0 ▶ P(unacceptable forms) = 0 ▶ Assuming that attested forms are all acceptable…
Learning in OT RCD The subset problem The GLA and HG References 6/55
▶ Inferring underlying forms, with hidden structure, on basis of overt
▶ Given those UR’s, construct set of informative competing
▶ Given UR’s and candidates, find a consistent ranking
Learning in OT RCD The subset problem The GLA and HG References 7/55
▶ This guarantees that the grammar generates a consistent output
▶ I.e., no inconsistencies that would lead to ranking paradoxes!
Learning in OT RCD The subset problem The GLA and HG References 8/55
/ulod/ Ons *Coda Dep(V) Max Dep(C) a. ulod * * b. lo ** c. lodə * *
d. ʔulo * *
Learning in OT RCD The subset problem The GLA and HG References 9/55
▶ For each winner/loser pair, compare violations for each constraint ▶ If both violate a constraint C an equal number of times, these
▶ Identify C’s that assess uncancelled marks (either winner or loser
Learning in OT RCD The subset problem The GLA and HG References 10/55
1. Data: /ulod/ Ons *Coda Dep(V) Max Dep(C) a. ulod * * b. lo ** c. lodə * *
d. ʔulo * * 2. mdp’s:
Learning in OT RCD The subset problem The GLA and HG References 11/55
1. Data: /ulod/ Ons *Coda Dep(V) Max Dep(C) a. ulod * * b. lo ** c. lodə * *
d. ʔulo * * 2. mdp’s: /ulod/ → ʔulo Ons *Coda Dep(V) Max Dep(C) ʔulo, *ulod W W L L ʔulo, *lo W L ʔulo, *lodə W L
Learning in OT RCD The subset problem The GLA and HG References 11/55
1. Data: /ulod/ Ons *Coda Dep(V) Max Dep(C) a. ulod * * b. lo ** c. lodə * *
d. ʔulo * * 2. mdp’s: /ulod/ → ʔulo Ons *Coda Dep(V) Max Dep(C) ʔulo, *ulod W W L L ʔulo, *lo W L ʔulo, *lodə W L
▶ All L’s are now covered by higher-ranked W’s ▶ I.e., all mdp’s are eliminated
Learning in OT RCD The subset problem The GLA and HG References 11/55
1. Data: /ulod/ Ons *Coda Dep(V) Max Dep(C) a. ulod * * b. lo ** c. lodə * *
d. ʔulo * * 2. mdp’s: /ulod/ → ʔulo Ons *Coda Dep(V) Max Dep(C) ʔulo, *ulod W W L L ʔulo, *lo W L ʔulo, *lodə W L
▶ All L’s are now covered by higher-ranked W’s ▶ I.e., all mdp’s are eliminated
Learning in OT RCD The subset problem The GLA and HG References 11/55
Learning in OT RCD The subset problem The GLA and HG References 12/55
Learning in OT RCD The subset problem The GLA and HG References 13/55
Learning in OT RCD The subset problem The GLA and HG References 14/55
Learning in OT RCD The subset problem The GLA and HG References 15/55
Learning in OT RCD The subset problem The GLA and HG References 16/55
Learning in OT RCD The subset problem The GLA and HG References 17/55
Learning in OT RCD The subset problem The GLA and HG References 18/55
Learning in OT RCD The subset problem The GLA and HG References 19/55
Learning in OT RCD The subset problem The GLA and HG References 20/55
Learning in OT RCD The subset problem The GLA and HG References 21/55
Learning in OT RCD The subset problem The GLA and HG References 22/55
Learning in OT RCD The subset problem The GLA and HG References 23/55
▶ Maximally K strata → maximally K demotion steps ▶ At each step, we have maximally K constraints to consider
▶ T
▶ T
▶ “Efficient” (Achilles heel: how many losing candidates?)
▶ If there are multiple consistent rankings, it will find one of them
Learning in OT RCD The subset problem The GLA and HG References 24/55
Learning in OT RCD The subset problem The GLA and HG References 25/55
▶ RCD learns stratified hierarchies, but cannot (in the general case)
▶ Simply assume total ranking imposed later, by independent means
▶ Constraints with same violation patterns (have to be placed
▶ Constraints with just W’s and ties ▶ This point is taken up below
Learning in OT RCD The subset problem The GLA and HG References 26/55
▶ Once a mdp has a ‘W-preferrer’ installed, it’s guaranteed to be
Learning in OT RCD The subset problem The GLA and HG References 27/55
Learning in OT RCD The subset problem The GLA and HG References 28/55
▶ E.g., [sa] ← /sap/, [mata] ← /mat/
▶ Hypothesize UR’s, in relatively unconstrained fashion (/UR/ → [SR]
▶ Construct mdp’s ▶ Attempt to learn a consistent ranking ▶ If learning terminates with no consistent ranking, randomly modify
Learning in OT RCD The subset problem The GLA and HG References 29/55
▶ Robust interpretive parsing: augment with any hidden structure
▶ E.g., parse prosodic structure, etc. ▶ We’ll leave this aside for the moment
1T
esar (2013, 2017) proves that this holds true for OT-consistent languages, as long as all interactions of processes are ‘transparent’ (there is no opacity).
Learning in OT RCD The subset problem The GLA and HG References 30/55
Learning in OT RCD The subset problem The GLA and HG References 31/55
Learning in OT RCD The subset problem The GLA and HG References 31/55
Learning in OT RCD The subset problem The GLA and HG References 31/55
▶ Tied or winner-preferring for all mdp’s (why?)
▶ Unintended consequence: grammar typically allows quite a bit
▶ Data is ambiguous: consistent with grammars that produce many
▶ Claim: we want a grammar that produces the smallest possible
Learning in OT RCD The subset problem The GLA and HG References 32/55
▶ I.e., they solve the subset problem, somehow ▶ At a first pass, this is probably more right than it is wrong, but
▶ We’ll come back to this, but for now we’ll stick to the way the
▶ Premise: more permissive hypothesis can never be falsified based
▶ So, once you adopt a more permissive analysis, you’re stuck there
▶ Here too, one might question the premise, but we’ll grant it for
Learning in OT RCD The subset problem The GLA and HG References 33/55
▶ Faithful mapping has subset of violations of unfaithful mapping ▶ Once again, a caveat: counterfeeding opacity
▶ However, it’s a tough one to calculate…
Learning in OT RCD The subset problem The GLA and HG References 34/55
▶ /prat/ → pat ▶ /prat/ → pərat
▶ Favor correct set of unfaithful mappings, even in absence of
Learning in OT RCD The subset problem The GLA and HG References 35/55
▶ If input contains marked structures, prefer candidates that
▶ Allow forms to pass through the grammar unmodified
▶ M ≫ (M ≫ …) F: neutralization ▶ F ≫ M: contrast
Learning in OT RCD The subset problem The GLA and HG References 36/55
▶ E.g., F([±voi]) ≫ [+voi]: /ba/ → [ba]
Learning in OT RCD The subset problem The GLA and HG References 37/55
▶ *si is the only unranked constraint that doesn’t prefer a loser ▶ F prefers all winners, but starts out ranked below stratum 1
Learning in OT RCD The subset problem The GLA and HG References 38/55
▶ *si is the only unranked constraint that doesn’t prefer a loser ▶ F prefers all winners, but starts out ranked below stratum 1
▶ *ʃ, F([±ant]) both prefer winner ▶ *s demoted to next stratum
Learning in OT RCD The subset problem The GLA and HG References 39/55
Learning in OT RCD The subset problem The GLA and HG References 40/55
▶ Give M ‘first crack’ at ambiguous data ▶ Deploy F only where truly needed
▶ Ito and Mester (1999); Prince and T
Learning in OT RCD The subset problem The GLA and HG References 41/55
Learning in OT RCD The subset problem The GLA and HG References 42/55
▶ Constraints that assign L’s are demoted as far as necessary to
▶ Does not distinguish ‘useful’ (W-assigning) from ‘harmless’ (tie)
▶ Algorithm fails to converge if there are inconsistencies due to
Learning in OT RCD The subset problem The GLA and HG References 43/55
Learning in OT RCD The subset problem The GLA and HG References 44/55
Learning in OT RCD The subset problem The GLA and HG References 44/55
Learning in OT RCD The subset problem The GLA and HG References 45/55
Learning in OT RCD The subset problem The GLA and HG References 46/55
▶ *Coda([pa]) = 0, *Coda([pak]) = 1, *Coda([pak.pak]) = 2
▶ Max([pa]|/pa/) = 0, Max([pa]|/pak/) = 1
▶
c ∈ CON
▶ Compare ‘higher-ranked takes all’ mark cancellation in OT
Learning in OT RCD The subset problem The GLA and HG References 47/55
▶ ori ‘fold’ + kami ‘paper’ → origami, vs. hitori ‘one person’ + tabi
▶ Violated in loanwords: bagɯ ‘bug’, giga ‘giga’
Learning in OT RCD The subset problem The GLA and HG References 48/55
▶ E.g., don’t have both a coda and voiced stop in the same word ▶ Illustration of weighting paradox (p. 11)
▶ Trading off of markedness and faithfulness ▶ Locality of violations (e.g., McCarthy)
Learning in OT RCD The subset problem The GLA and HG References 50/55
▶ A set of structural descriptions (constraints) ▶ A set of output forms ▶ A probability distribution over of a set of output forms
▶ Weights that will generate the observed distribution
▶ Each time you encounter an input/output pair that you choose the
▶ See Magri (2012) and Boersma and Pater (2016) for details and
Learning in OT RCD The subset problem The GLA and HG References 52/55
▶ Very close values: almost a tie, both rankings have some
▶ Very far apart: C1 consistently outranks C1
▶ “Noisy evaluation”: ranking values for each constraint vary a little
▶ Probability distribution determined by weighted violations:
y∈Y exp(− ∑
Learning in OT RCD The subset problem The GLA and HG References 53/55
Boersma, P . and J. Pater (2016). Convergence properties of a gradual learning algorithm for Harmonic Grammar. In J. McCarthy and J. Pater (Eds.), Harmonic Serialism and Harmonic Grammar, pp. 389–434. Sheffield: Equinox. Hayes, B. Phonological acquisition in Optimality Theory: The early stages. pp. 158–203. Heinz, J. and J. Riggle (2011). Learnability. In M. van Oostendorp, C. Ewen, B. Hume, and K. Rice (Eds.), Blackwell Companion to Phonology, pp. 54–78. Wiley-Blackwell. Jarosz, G. (2007). Restrictiveness in phonological grammar and lexicon learning. In
Annual Meeting of the Chicago Lingusitics Society, pp. 125–139. Chicago Linguistic Society. Magri, G. (2012). Convergence of error-driven ranking algorithms. Phonology 29(2), 213–269. Pater, J. (2008). Gradual learning and convergence. Linguistic Inquiry 39, 334–345. Prince, A. and B. T
T esar, B. (2013). Output-Driven Phonology: Theory and Learning. Cambridge University Press.
Learning in OT RCD The subset problem The GLA and HG References 54/55
T esar, B. (2017). Phonological learning with output-driven maps. Language Acquisition 24, 148–167. T esar, B. and P . Smolensky (1996). Learnability in optimality theory (short version). T echnical report, Johns Hopkins University. T echnical Report JHU-CogSci-96-2.
Learning in OT RCD The subset problem The GLA and HG References 55/55