Class 7: Learning and learnability Adam Albright (albright@mit.edu) - PowerPoint PPT Presentation

Class 7: Learning and learnability Adam Albright (albright@mit.edu) LSA 2017 Phonology University of Kentucky

Announcements ▶ For those taking this class for credit ▶ Option 1: assignment 2 comments are posted, assignment 3 due next Monday 7/31 ▶ Option 2: short paper/squib due next Monday 7/31 ▶ The home stretch ▶ Questions? ▶ T oday: learning constraint rankings ▶ Next Monday: phonological typology Learning in OT RCD The subset problem The GLA and HG References 1/55

A question from last time: phonemes ▶ Phones ▶ Allophones ▶ Phonemes Learning in OT RCD The subset problem The GLA and HG References 2/55

Shizuoka Japanese (from assignment 2) Adjective Emphatic Gloss Adjective Emphatic Gloss a. hade hande ‘showy’ m. kitanai kitːanai ‘dirty’ b. ozoi onzoi ‘terrible’ n. kusai kusːai ‘stinky’ c. jowai joɴwai ‘weak’ o. ikai ikːai ‘big’ d. hajai haɴjai ‘fast’ p. zonzai zoːnzai ‘impolite’ e. kaɾai kaɴɾai ‘spicy’ q. kandaɾui kaːndaɾui ‘languid’ f. nagai naŋgai ‘long’ r. onzokutai oːnzokutai ‘ugly’ g. kanaʃiː kanːaʃiː ‘sad’ s. supːai suːpːai ‘sour’ h. amai amːai ‘sweet’ t. okːanai oːkːanai ‘scary’ i. katai katːai ‘hard’ u. o͡ɪʃiː o͡ːɪʃiː ‘delicious’ j. osoi osːoi ‘slow’ v. kiːɾoi kiːɴɾoi ‘yellow’ k. takai takːai ‘high’ w. toːtoi toːtːoi respectable l. atsui atːsui ‘hot’ Learning in OT RCD The subset problem The GLA and HG References 3/55

Shizuoka Japanese (from assignment 2) ▶ The hierarchy of preferences that we observed 1. Lengthen the V if there’s already NC or CC (i.e., if lengthening a C would result in a non-intervocalic Cː) 2. Else, insert an N if lengthening the C would result in a ‘bad geminate’ (voiced stop, glide, etc.) 3. Else, lengthen the consonant ▶ Preferences: Lengthen C > insert N > lengthen V ▶ General constraint ranking Some constraint that penalizes lengthening V (*Vː, or Ident /V) | Some constraint that penalizes inserting N (*NC, or Dep (N)) | Some constraint that penalizes lengthening C (*Cː, or Ident /C) ▶ Forcing violations of higher-ranked constraints ▶ *non-intervocalic Cː, *bad geminate outrank all of the above ▶ That is, don’t create bad sequences merely in order to employ a preferred change Learning in OT RCD The subset problem The GLA and HG References 4/55

The architecture of Optimality Theory ▶ Con (universal, but must have particular form) ▶ M : Evaluate phonological form of output (incl. hidden structure) ▶ F : Or relation between input and output forms ▶ Eval (language particular ranking, procedure is universal) ▶ Gen (universal, and relatively generic) ▶ Augments underlying forms with additional structure (syllabification, prosodic structure, etc.) ▶ Modifies structures of input in various ways to generate competing candidates ▶ Lexicon (language particular, must be learned) ▶ Phonological strings, in same featural representation as output representations ▶ Possibly additional structure, not present in output? Learning in OT RCD The subset problem The GLA and HG References 5/55

What is data used for in OT? ▶ Infer the grammar and lexicon that were used to generate forms ▶ This is an unachievable goal, in the general case! ▶ Less ambitious: infer a grammar that can generate the data ▶ Minimally: P(attested forms) > 0 ▶ Ideally… ▶ P(attested forms) > 0 ▶ P(unattested forms) = 0 ▶ Modeling humans ▶ P(acceptable forms) > 0 ▶ P(unacceptable forms) = 0 ▶ Assuming that attested forms are all acceptable…   P(attested forms) > 0   P(accidentally unattested, but acceptable forms) > 0 P(unattested and unacceptable forms) = 0   Learning in OT RCD The subset problem The GLA and HG References 6/55

Decomposing the problem ▶ Really hard part: ▶ Inferring underlying forms, with hidden structure, on basis of overt evidence ▶ Somewhat hard part: ▶ Given those UR’s, construct set of informative competing candidates ▶ The easier part: ▶ Given UR’s and candidates, find a consistent ranking Learning in OT RCD The subset problem The GLA and HG References 7/55

The easy part Ensure P(attested data) > 0 with Recursive Constraint Demotion ▶ Crucial assumption: input data is generated with a grammar with a total ranking ▶ This guarantees that the grammar generates a consistent output for all possible inputs ▶ I.e., no inconsistencies that would lead to ranking paradoxes! ▶ Mark cancellation: reduce tableau to W’s and L’s (comparative format) ▶ At each step, find set of constraints with L’s for any active mdp’s, and demote them ▶ Remove mdp’s with W’s for undemoted constraints (“in current stratum”) ▶ Recursion: rank remaining constraints in similar fashion, for remaining mdp’s Learning in OT RCD The subset problem The GLA and HG References 8/55

Ons *Coda Dep(V) Max Dep(C) An example T esar and Smolensky (1996, p. 14) /ulod/ a. ulod * * b. lo ** c. lodə * * d. ʔulo * * ☞ ▶ Assume we’re given inputs, attested (winning) output, and losing candidates Learning in OT RCD The subset problem The GLA and HG References 9/55

The procedure: an overview 1. Construct mark-data pairs ▶ For each winner/loser pair, compare violations for each constraint ▶ If both violate a constraint C an equal number of times, these marks ‘cancel each other out’ ▶ Identify C ’s that assess uncancelled marks (either winner or loser have more violations) (i.e., make a comparative tableau!) 2. Start with all constraints in a single stratum (no crucial rankings) 3. Look for constraints C that assign uncancelled marks to winners (that is, all constraints with L). Demote any such C , unless it is already dominated by another constraint C ′ that has uncancelled loser marks (that is, a higher W) 4. Continue, creating subsequent strata, until there are no uncancelled winner marks without higher-ranked uncancelled loser marks 5. Refine: given partial ranking, pick a total ranking Learning in OT RCD The subset problem The GLA and HG References 10/55

3. Demote constraints with L’s: Max , Dep(C) All L’s are now covered by higher-ranked W’s I.e., all mdp’s are eliminated Refine 2. mdp’s: Ons *Coda Dep(V) Max Dep(C) An example Applying RCD /ulod/ a. ulod * * 1. b. lo ** Data: c. lodə * * d. ʔulo * * ☞ Learning in OT RCD The subset problem The GLA and HG References 11/55

3. Demote constraints with L’s: Max , Dep(C) All L’s are now covered by higher-ranked W’s I.e., all mdp’s are eliminated Refine Dep(C) Ons *Coda Dep(V) Max Dep(C) Max Dep(V) *Coda Ons An example Applying RCD /ulod/ a. ulod * * 1. b. lo ** Data: c. lodə * * d. ʔulo * * ☞ /ulod/ → ʔulo ʔulo, *ulod W W L L 2. ʔulo, *lo W L mdp’s: ʔulo, *lodə W L Learning in OT RCD The subset problem The GLA and HG References 11/55

Refine *Coda Dep(V) Ons Max Dep(C) Ons *Coda Dep(V) Max Dep(C) An example Applying RCD /ulod/ a. ulod * * 1. b. lo ** Data: c. lodə * * d. ʔulo * * ☞ /ulod/ → ʔulo ʔulo, *ulod W W L L 2. ʔulo, *lo W L mdp’s: ʔulo, *lodə W L ▶ 3. Demote constraints with L’s: Max , Dep(C) ▶ All L’s are now covered by higher-ranked W’s ▶ I.e., all mdp’s are eliminated Learning in OT RCD The subset problem The GLA and HG References 11/55

Ons Dep(C) *Coda Dep(V) Max Ons *Coda Dep(V) Max Dep(C) An example Applying RCD /ulod/ a. ulod * * 1. b. lo ** Data: c. lodə * * d. ʔulo * * ☞ /ulod/ → ʔulo ʔulo, *ulod W W L L 2. ʔulo, *lo W L mdp’s: ʔulo, *lodə W L ▶ 3. Demote constraints with L’s: Max , Dep(C) ▶ All L’s are now covered by higher-ranked W’s ▶ I.e., all mdp’s are eliminated ▶ Refine Learning in OT RCD The subset problem The GLA and HG References 11/55

Another example For practice: Nupe Input Intended *si *s *ʃ Ident ([±ant]) /sa/ sa * ☞ ʃa * * /ʃa/ ʃa * sa * * ☞ /si/ si * * ʃi * * ☞ /ʃi/ ʃi * ☞ si * * * Learning in OT RCD The subset problem The GLA and HG References 12/55

RCD with unfaithful mappings Step 1: construct mark-data pairs Input Winner Loser *si *s *ʃ F ( ± ant) /sa/ sa ʃa L W W /ʃa/ sa ʃa L W L /si/ ʃi si W W L L /ʃi/ ʃi si W W L W ▶ For each constraint, calculate ∆ (winner violations, loser violations)  W if ∆ (w,l) < 1  ▶ Constraint preference: L if ∆ (w,l) > 1 tie otherwise  Learning in OT RCD The subset problem The GLA and HG References 13/55

RCD with unfaithful mappings Step 2: demote Input Winner Loser *si *s *ʃ F ( ± ant) /sa/ sa ʃa L W W /ʃa/ sa ʃa L W L /si/ ʃi si W W L L /ʃi/ ʃi si W W L W ▶ Constraints that prefer only winners are placed in the current stratum ▶ Constraints that prefer losers are demoted Learning in OT RCD The subset problem The GLA and HG References 14/55

Class 7: Learning and learnability Adam Albright (albright@mit.edu) - PowerPoint PPT Presentation

Class 7: Learning and learnability Adam Albright (albright@mit.edu) LSA 2017 Phonology University of Kentucky Announcements For those taking this class for credit Option 1: assignment 2 comments are posted, assignment 3 due next Monday

Machine learning theory Nonuniform learnability Hamid Beigy Sharif university of technology

Plan Introduction 1 On categorial grammars and learnability 2 Logical Information Systems

Learnability and models of decision making under uncertainty Pathikrit Basu Federico Echenique

Evaluating Learnability of - User interface and inline help - Inline/Online Tutorials Aim:

An experimental study of the learnability of congestion control Anirudh Sivaraman, Keith

Computational Learning Theory: Positive and negative learnability results Machine Learning 1

User interface for learning Aim: Design for and evaluate learnability Writing inline

User interface for learning Aim: Design for and evaluate learnability Writing inline

PAC Learnability and Bayes Classifier Matthieu R. Bloch 1 PAC learnability Tie last question to

User interface for learning Aim: Design for learnability Writing inline help Basis

Learnability Beyond Uniform Convergence Shai Shalev-Shwartz School of CS and Engineering, The

Learnability Beyond Uniform Convergence Shai Shalev-Shwartz School of CS and Engineering, The

Non Uniform Learnability prof. dr Arno Siebes Algorithmic Data Analysis Group Department of

A Formal Proof of PAC Learnability for Decision Stumps Joseph Tassarotti Boston College

On the exact learnability of graph parameters The case of partition functions Nadia Labai TU

Programming Abstraction in C++ Eric S. Roberts and Julie Zelenski Stanford University 2010

through Glo lobal Lib iberal Arts Responsive, Emergent Vision and Effective, Collaborative

in Computer Science An Overview Jessica Chen-Burger Computer Science Heriot-Watt University 1

AP BIOLOGY Big Idea 4 Part A March 2013 www.njctl.org Slide 3 / 112 Big Idea 4: Biological

The Impostor Syndrome GOTOams 2016, Gitte Klitgaard, Native Wired @nativewired Me

On the Use of PLDA i-Vector Scoring for Clustering Short Segments Itay Salmun Irit Opher Itshak

@natinfracom #ukinfra2050 What are the key principles for developing an integrated

London Plan Integrated Impact Assessment The IIA Report provides an assessment of the London Plan

Introduction to MPI and OpenMP myson @ postech.ac.kr CSE700-PL @ POSTECH Programming Language