phonological trends in the lexicon practicum
play

Phonological trends in the lexicon Practicum Michael Becker - PowerPoint PPT Presentation

Phonological trends in the lexicon Practicum Michael Becker University of Massachusetts Amherst michael.becker@phonologist.org EVELIN 2012 MIT / UNICAMP Campinas, Brazil 1 / 20 Practicum overview Practicum overview Formulating


  1. Phonological trends in the lexicon — Practicum Michael Becker University of Massachusetts Amherst michael.becker@phonologist.org EVELIN 2012 MIT / UNICAMP Campinas, Brazil 1 / 20

  2. Practicum overview • • Practicum overview Formulating a falsifiable hypothesis Lexicon study • Lexicon study Experimental design ◦ Building a lexicon Building and running an experiment ◦ Data exploration with regular expressions Where to go next ◦ Regression modeling • Working with audio materials ◦ Recording ◦ Praat work ◦ Scripting and automation • Experimental design ◦ Formulating a testable hypothesis ◦ Online experiments / web interface ◦ Regression modeling • Comparing the lexicon and the experiment 2 / 20

  3. • Practicum overview Lexicon study • Building a lexicon • Example: Portuguese plurals • Dealing with text files • Lexical statistics Experimental design Building and running an Lexicon study experiment Where to go next 3 / 20

  4. Building a lexicon • • Practicum overview List of paradigms Lexicon study • Word list • Building a lexicon • Custom list • Example: Portuguese plurals • Opportunistic data collection • Dealing with text files • Lexical statistics Experimental design Building and running an experiment Where to go next 4 / 20

  5. Building a lexicon • • Practicum overview List of paradigms Lexicon study • Building a lexicon ◦ Turkish: TELL (Inkelas et al. 2000) • Example: Portuguese ◦ plurals Hebrew: LLHN (Bolozky & Becker 2006) • Dealing with text files ◦ Russian: Usachev (2004), based on Zaliznjak (1977) • Lexical statistics ◦ Others? Experimental design Building and running an experiment Not very common, very useful — why? Where to go next • Word list • Custom list • Opportunistic data collection 4 / 20

  6. Building a lexicon • • Practicum overview List of paradigms Lexicon study • Word list • Building a lexicon • Example: Portuguese ◦ English: CMU ( http://www.speech.cs.cmu.edu/cgi-bin/cmudict ), plurals • Dealing with text files CELEX (Baayen et al. 1995, not free) • Lexical statistics ◦ French: Lexique ( http://www.lexique.org/ ) Experimental design ◦ Portuguese: LABEL-LEX ( http://label.ist.utl.pt/en/labellex_en.php ) Building and running an ◦ Many others. experiment Where to go next Googling for e.g., "Kabardian word list" usually helps. Asking around is a good idea too. You can use the word list to prepare a list of stems, and then add the other morphological category manually. It’s a lot of work, but it can help generate ideas. • Custom list • Opportunistic data collection 4 / 20

  7. Building a lexicon • • Practicum overview List of paradigms Lexicon study • Word list • Building a lexicon • Custom list • Example: Portuguese plurals • Dealing with text files ◦ If available, use a paper dictionary. Scanning + OCR can save • Lexical statistics a lot of work. Hire research assistants to help. Experimental design ◦ Building and running an Use corpora and/or search engines to expand your empirical experiment scope. Where to go next In recent years, Google has become less useful for such searches. • Opportunistic data collection 4 / 20

  8. Example: Portuguese plurals • Practicum overview How did we get from the word-list of LABEL-LEX to a corpus of Lexicon study Portuguese plurals? • Building a lexicon • Example: Portuguese • plurals Transform from spelling to IPA • Dealing with text files • Extract the [w]-final words • Lexical statistics • Collect judgments Experimental design • Building and running an Coding experiment Where to go next 5 / 20

  9. Example: Portuguese plurals • Practicum overview How did we get from the word-list of LABEL-LEX to a corpus of Lexicon study Portuguese plurals? • Building a lexicon • Example: Portuguese • plurals Transform from spelling to IPA • Dealing with text files • Lexical statistics ◦ The original file Experimental design ◦ A series of regular expression substitutions Building and running an Result: spelling + IPA (mostly) experiment ◦ Where to go next • Extract the [w]-final words • Collect judgments • Coding 5 / 20

  10. Example: Portuguese plurals • Practicum overview How did we get from the word-list of LABEL-LEX to a corpus of Lexicon study Portuguese plurals? • Building a lexicon • Example: Portuguese • plurals Transform from spelling to IPA • Dealing with text files • Lexical statistics ◦ The original file Experimental design Building and running an experiment N a Where to go next N aacheniano N aal N aaleniano N aba N ababá N ababalhamento N ababosamento ◦ A series of regular expression substitutions Result: spelling + IPA (mostly) ◦ 5 / 20 • Extract the [w]-final words

  11. Example: Portuguese plurals • Practicum overview How did we get from the word-list of LABEL-LEX to a corpus of Lexicon study Portuguese plurals? • Building a lexicon • Example: Portuguese • plurals Transform from spelling to IPA • Dealing with text files • Lexical statistics ◦ The original file Experimental design ◦ A series of regular expression substitutions Building and running an experiment Where to go next For example: eõ]) → $1z$2 ([aeiouáéíóú㘠eõ])s([aeiouáéíóú㘠ss → s Learn more about regular expressions! http://pt.wikipedia.org/wiki/Expressão_regular We automated the substitutions with a Perl script. ◦ Result: spelling + IPA (mostly) • Extract the [w]-final words 5 / 20 • Collect judgments

  12. Example: Portuguese plurals • Practicum overview How did we get from the word-list of LABEL-LEX to a corpus of Lexicon study Portuguese plurals? • Building a lexicon • Example: Portuguese • plurals Transform from spelling to IPA • Dealing with text files • Lexical statistics ◦ The original file Experimental design ◦ A series of regular expression substitutions Building and running an Result: spelling + IPA (mostly) experiment ◦ Where to go next N a " a N aacheniano aa S eni " ano N aal a " aw N aaleniano aaleni " ano N aba " aba N ababá abab " a N ababalhamento ababa L am " ˜ eto N ababosamento ababozam " ˜ eto • Extract the [w]-final words 5 / 20 • Collect judgments

  13. Example: Portuguese plurals • Practicum overview How did we get from the word-list of LABEL-LEX to a corpus of Lexicon study Portuguese plurals? • Building a lexicon • Example: Portuguese • plurals Transform from spelling to IPA • Dealing with text files • Extract the [w]-final words • Lexical statistics Experimental design Again, a regular expression: w$ ◦ Building and running an experiment ◦ No need for programming — a text editor with support for Where to go next regular expressions is good too: Notepad++ (Windows), TextWrangler (Mac) + OpenOffice/LibreOffice ◦ We got a list of 5742 words — mostly nouns and adjectives. • Collect judgments • Coding 5 / 20

  14. Example: Portuguese plurals • Practicum overview How did we get from the word-list of LABEL-LEX to a corpus of Lexicon study Portuguese plurals? • Building a lexicon • Example: Portuguese • plurals Transform from spelling to IPA • Dealing with text files • Extract the [w]-final words • Lexical statistics • Collect judgments Experimental design Building and running an experiment ◦ Do we really need ALL the adjectives that end in [aw], [ew]...? Where to go next ◦ The monosyllables are manageable, so we take all of them. We asked three people to supply plurals for them. ◦ We want a good sample of polysyllables. ◦ Randomize, and choose the sizable portion. Excel trick: items in one column, =rand() in a second column, and sort by the random number. We asked one person to supply plurals for our sample of polysyllables. 5 / 20 • Coding

  15. Example: Portuguese plurals • Practicum overview How did we get from the word-list of LABEL-LEX to a corpus of Lexicon study Portuguese plurals? • Building a lexicon • Example: Portuguese • plurals Transform from spelling to IPA • Dealing with text files • Extract the [w]-final words • Lexical statistics • Collect judgments Experimental design • Building and running an Coding experiment Where to go next Some words did’t have a plural → excluded ◦ ◦ 0 = faithful, 1 = alternating, .5 = optional ◦ [malis], [ab R ilis] coded as faithful Items with one than one rating → averaged ◦ 5 / 20

  16. Dealing with text files • Practicum overview Text is the bread and butter of computing. Lexicon study • Building a lexicon • Text file vs. binary file • Example: Portuguese • plurals Plain text editors • Dealing with text files • Unicode • Lexical statistics • Regular expressions Experimental design Building and running an experiment Where to go next 6 / 20

  17. Lexical statistics • • Practicum overview Descriptive statistics Lexicon study • Inferential statistics • Building a lexicon • Limits of logistic regressions • Example: Portuguese plurals • Dealing with text files • Lexical statistics Experimental design Building and running an experiment Where to go next 7 / 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend