Phonological trends in the lexicon Practicum Michael Becker - - PowerPoint PPT Presentation
Phonological trends in the lexicon Practicum Michael Becker - - PowerPoint PPT Presentation
Phonological trends in the lexicon Practicum Michael Becker University of Massachusetts Amherst michael.becker@phonologist.org EVELIN 2012 MIT / UNICAMP Campinas, Brazil 1 / 20 Practicum overview Practicum overview Formulating
Practicum overview
- Practicum overview
Lexicon study Experimental design Building and running an experiment Where to go next
2 / 20
- Formulating a falsifiable hypothesis
- Lexicon study
- Building a lexicon
- Data exploration with regular expressions
- Regression modeling
- Working with audio materials
- Recording
- Praat work
- Scripting and automation
- Experimental design
- Formulating a testable hypothesis
- Online experiments / web interface
- Regression modeling
- Comparing the lexicon and the experiment
Lexicon study
- Practicum overview
Lexicon study
- Building a lexicon
- Example: Portuguese
plurals
- Dealing with text files
- Lexical statistics
Experimental design Building and running an experiment Where to go next
3 / 20
Building a lexicon
- Practicum overview
Lexicon study
- Building a lexicon
- Example: Portuguese
plurals
- Dealing with text files
- Lexical statistics
Experimental design Building and running an experiment Where to go next
4 / 20
- List of paradigms
- Word list
- Custom list
- Opportunistic data collection
Building a lexicon
- Practicum overview
Lexicon study
- Building a lexicon
- Example: Portuguese
plurals
- Dealing with text files
- Lexical statistics
Experimental design Building and running an experiment Where to go next
4 / 20
- List of paradigms
- Turkish: TELL (Inkelas et al. 2000)
- Hebrew: LLHN (Bolozky & Becker 2006)
- Russian: Usachev (2004), based on Zaliznjak (1977)
- Others?
Not very common, very useful — why?
- Word list
- Custom list
- Opportunistic data collection
Building a lexicon
- Practicum overview
Lexicon study
- Building a lexicon
- Example: Portuguese
plurals
- Dealing with text files
- Lexical statistics
Experimental design Building and running an experiment Where to go next
4 / 20
- List of paradigms
- Word list
- English: CMU (http://www.speech.cs.cmu.edu/cgi-bin/cmudict),
CELEX (Baayen et al. 1995, not free)
- French: Lexique (http://www.lexique.org/)
- Portuguese: LABEL-LEX (http://label.ist.utl.pt/en/labellex_en.php)
- Many others.
Googling for e.g., "Kabardian word list" usually helps. Asking around is a good idea too. You can use the word list to prepare a list of stems, and then add the other morphological category manually. It’s a lot of work, but it can help generate ideas.
- Custom list
- Opportunistic data collection
Building a lexicon
- Practicum overview
Lexicon study
- Building a lexicon
- Example: Portuguese
plurals
- Dealing with text files
- Lexical statistics
Experimental design Building and running an experiment Where to go next
4 / 20
- List of paradigms
- Word list
- Custom list
- If available, use a paper dictionary. Scanning + OCR can save
a lot of work. Hire research assistants to help.
- Use corpora and/or search engines to expand your empirical
scope. In recent years, Google has become less useful for such searches.
- Opportunistic data collection
Example: Portuguese plurals
- Practicum overview
Lexicon study
- Building a lexicon
- Example: Portuguese
plurals
- Dealing with text files
- Lexical statistics
Experimental design Building and running an experiment Where to go next
5 / 20
How did we get from the word-list of LABEL-LEX to a corpus of Portuguese plurals?
- Transform from spelling to IPA
- Extract the [w]-final words
- Collect judgments
- Coding
Example: Portuguese plurals
- Practicum overview
Lexicon study
- Building a lexicon
- Example: Portuguese
plurals
- Dealing with text files
- Lexical statistics
Experimental design Building and running an experiment Where to go next
5 / 20
How did we get from the word-list of LABEL-LEX to a corpus of Portuguese plurals?
- Transform from spelling to IPA
- The original file
- A series of regular expression substitutions
- Result: spelling + IPA (mostly)
- Extract the [w]-final words
- Collect judgments
- Coding
Example: Portuguese plurals
- Practicum overview
Lexicon study
- Building a lexicon
- Example: Portuguese
plurals
- Dealing with text files
- Lexical statistics
Experimental design Building and running an experiment Where to go next
5 / 20
How did we get from the word-list of LABEL-LEX to a corpus of Portuguese plurals?
- Transform from spelling to IPA
- The original file
N a N aacheniano N aal N aaleniano N aba N ababá N ababalhamento N ababosamento
- A series of regular expression substitutions
- Result: spelling + IPA (mostly)
- Extract the [w]-final words
Example: Portuguese plurals
- Practicum overview
Lexicon study
- Building a lexicon
- Example: Portuguese
plurals
- Dealing with text files
- Lexical statistics
Experimental design Building and running an experiment Where to go next
5 / 20
How did we get from the word-list of LABEL-LEX to a corpus of Portuguese plurals?
- Transform from spelling to IPA
- The original file
- A series of regular expression substitutions
For example:
([aeiouáéíóú㘠eõ])s([aeiouáéíóú㘠eõ]) → $1z$2 ss → s
Learn more about regular expressions!
http://pt.wikipedia.org/wiki/Expressão_regular
We automated the substitutions with a Perl script.
- Result: spelling + IPA (mostly)
- Extract the [w]-final words
- Collect judgments
Example: Portuguese plurals
- Practicum overview
Lexicon study
- Building a lexicon
- Example: Portuguese
plurals
- Dealing with text files
- Lexical statistics
Experimental design Building and running an experiment Where to go next
5 / 20
How did we get from the word-list of LABEL-LEX to a corpus of Portuguese plurals?
- Transform from spelling to IPA
- The original file
- A series of regular expression substitutions
- Result: spelling + IPA (mostly)
N a "a N aacheniano aaSeni"ano N aal a"aw N aaleniano aaleni"ano N aba "aba N ababá abab"a N ababalhamento ababaLam"˜ eto N ababosamento ababozam"˜ eto
- Extract the [w]-final words
- Collect judgments
Example: Portuguese plurals
- Practicum overview
Lexicon study
- Building a lexicon
- Example: Portuguese
plurals
- Dealing with text files
- Lexical statistics
Experimental design Building and running an experiment Where to go next
5 / 20
How did we get from the word-list of LABEL-LEX to a corpus of Portuguese plurals?
- Transform from spelling to IPA
- Extract the [w]-final words
- Again, a regular expression: w$
- No need for programming — a text editor with support for
regular expressions is good too: Notepad++ (Windows), TextWrangler (Mac) + OpenOffice/LibreOffice
- We got a list of 5742 words — mostly nouns and adjectives.
- Collect judgments
- Coding
Example: Portuguese plurals
- Practicum overview
Lexicon study
- Building a lexicon
- Example: Portuguese
plurals
- Dealing with text files
- Lexical statistics
Experimental design Building and running an experiment Where to go next
5 / 20
How did we get from the word-list of LABEL-LEX to a corpus of Portuguese plurals?
- Transform from spelling to IPA
- Extract the [w]-final words
- Collect judgments
- Do we really need ALL the adjectives that end in [aw], [ew]...?
- The monosyllables are manageable, so we take all of them.
We asked three people to supply plurals for them.
- We want a good sample of polysyllables.
- Randomize, and choose the sizable portion.
Excel trick: items in one column, =rand() in a second column, and sort by the random number. We asked one person to supply plurals for our sample of polysyllables.
- Coding
Example: Portuguese plurals
- Practicum overview
Lexicon study
- Building a lexicon
- Example: Portuguese
plurals
- Dealing with text files
- Lexical statistics
Experimental design Building and running an experiment Where to go next
5 / 20
How did we get from the word-list of LABEL-LEX to a corpus of Portuguese plurals?
- Transform from spelling to IPA
- Extract the [w]-final words
- Collect judgments
- Coding
- Some words did’t have a plural → excluded
- 0 = faithful, 1 = alternating, .5 = optional
- [malis], [abRilis] coded as faithful
- Items with one than one rating → averaged
Dealing with text files
- Practicum overview
Lexicon study
- Building a lexicon
- Example: Portuguese
plurals
- Dealing with text files
- Lexical statistics
Experimental design Building and running an experiment Where to go next
6 / 20
Text is the bread and butter of computing.
- Text file vs. binary file
- Plain text editors
- Unicode
- Regular expressions
Lexical statistics
- Practicum overview
Lexicon study
- Building a lexicon
- Example: Portuguese
plurals
- Dealing with text files
- Lexical statistics
Experimental design Building and running an experiment Where to go next
7 / 20
- Descriptive statistics
- Inferential statistics
- Limits of logistic regressions
Lexical statistics
- Practicum overview
Lexicon study
- Building a lexicon
- Example: Portuguese
plurals
- Dealing with text files
- Lexical statistics
Experimental design Building and running an experiment Where to go next
7 / 20
- Descriptive statistics
- What is the average alternation rate for each size? for each
final vowel? for combinations?
- Can be done in Excel/OpenOffice (pivot tables, subtotals)
- Even better: R (xtabs, aggregate)
- Commercial programs: SPSS, Stata
- Become an expert in Excel and/or R
- Visualization
- Inferential statistics
- Limits of logistic regressions
Lexical statistics
- Practicum overview
Lexicon study
- Building a lexicon
- Example: Portuguese
plurals
- Dealing with text files
- Lexical statistics
Experimental design Building and running an experiment Where to go next
7 / 20
- Descriptive statistics
- What is the average alternation rate for each size? for each
final vowel? for combinations?
- Can be done in Excel/OpenOffice (pivot tables, subtotals)
- Even better: R (xtabs, aggregate)
- Commercial programs: SPSS, Stata
- Become an expert in Excel and/or R
- Harald Baayen: Analyzing Linguistic Data
- Keith Johnson: Quantitative Methods in Linguistics
- Andrew Gelman & Jennifer Hill: Data Analysis Using
Regression and Multilevel/Hierarchical Models Can be found online.
- Visualization
- Inferential statistics
- Limits of logistic regressions
Lexical statistics
- Practicum overview
Lexicon study
- Building a lexicon
- Example: Portuguese
plurals
- Dealing with text files
- Lexical statistics
Experimental design Building and running an experiment Where to go next
7 / 20
- Descriptive statistics
- What is the average alternation rate for each size? for each
final vowel? for combinations?
- Can be done in Excel/OpenOffice (pivot tables, subtotals)
- Even better: R (xtabs, aggregate)
- Commercial programs: SPSS, Stata
- Become an expert in Excel and/or R
- Visualization
0% 20% 40% 60% 80% 100%
mono iamb trochee
7 265 45 8 6 17 37 2
faithful intermediate alternating
Lexical statistics
- Practicum overview
Lexicon study
- Building a lexicon
- Example: Portuguese
plurals
- Dealing with text files
- Lexical statistics
Experimental design Building and running an experiment Where to go next
7 / 20
- Descriptive statistics
- Inferential statistics
Regression:
- making predictions
- confidence in predictions
Example: logistic regression for the Portuguese [w]-final words
β SE(β) z p(>|z|)
(Intercept) 0.10 0.38 0.27 mono vs. iamb 3.21 0.49 6.60
<.0001
iamb vs. trochee 3.52 1.24 2.85
<.005
lax 2.67 0.57 4.64
<.0001
high 0.20 0.29 0.69
>.1
- Limits of logistic regressions
Lexical statistics
- Practicum overview
Lexicon study
- Building a lexicon
- Example: Portuguese
plurals
- Dealing with text files
- Lexical statistics
Experimental design Building and running an experiment Where to go next
7 / 20
- Descriptive statistics
- Inferential statistics
- Limits of logistic regressions
A logistic regression is impossible when some predictor leads to a categorical distinction (e.g., some alternation always happens/never happens when...)
- In R: bayesglm in the arm package
- Decision trees (in R or other programs)
Experimental design
- Practicum overview
Lexicon study Experimental design
- Wug-test
- Artificial grammars
- Others kinds of
experiments Building and running an experiment Where to go next
8 / 20
Wug-test
- Practicum overview
Lexicon study Experimental design
- Wug-test
- Artificial grammars
- Others kinds of
experiments Building and running an experiment Where to go next
9 / 20
- Creating items
- Choosing a task
- Recruiting participants
- Randomization
- Fillers/distractors
- Instructions
- Respect, demographic questions, feedback
Wug-test
- Practicum overview
Lexicon study Experimental design
- Wug-test
- Artificial grammars
- Others kinds of
experiments Building and running an experiment Where to go next
9 / 20
- Creating items
- The hypothesis guides the creation of the items.
e.g. monosyllables and polysyllables, words with [e] and words with [E]
- Try to factor out the things you are not interested in, like
consonants e.g. compare [dew] and [dEw] to test a vowel effect
- More = better.
50 people responding to 1 item each is much better than 50 people responding to the same item. (why?)
- Choosing a task
- Recruiting participants
- Randomization
- Fillers/distractors
- Instructions
- Respect, demographic questions, feedback
Wug-test
- Practicum overview
Lexicon study Experimental design
- Wug-test
- Artificial grammars
- Others kinds of
experiments Building and running an experiment Where to go next
9 / 20
- Creating items
- Choosing a task
- Binary forced choice
- Scalar forced choice
- Binary judgment
- Scalar judgment
- Production task (orthographic)
- Production task (auditory)
- Recruiting participants
- Randomization
- Fillers/distractors
- Instructions
- Respect, demographic questions, feedback
Wug-test
- Practicum overview
Lexicon study Experimental design
- Wug-test
- Artificial grammars
- Others kinds of
experiments Building and running an experiment Where to go next
9 / 20
- Creating items
- Choosing a task
- Recruiting participants
- On campus, online, in the field, in class...
- Amazon’s Mechanical Turk
- Each participant must respond independently to get valid
results.
- Randomization
- Fillers/distractors
- Instructions
- Respect, demographic questions, feedback
Wug-test
- Practicum overview
Lexicon study Experimental design
- Wug-test
- Artificial grammars
- Others kinds of
experiments Building and running an experiment Where to go next
9 / 20
- Creating items
- Choosing a task
- Recruiting participants
- Randomization
- Randomizing the order of items
- In a forced choice task, random order of choices
- Why?
- Fillers/distractors
- Instructions
- Respect, demographic questions, feedback
Wug-test
- Practicum overview
Lexicon study Experimental design
- Wug-test
- Artificial grammars
- Others kinds of
experiments Building and running an experiment Where to go next
9 / 20
- Creating items
- Choosing a task
- Recruiting participants
- Randomization
- Fillers/distractors
- Need to make sense given the task
- How many?
- Randomized with the target stimuli
- Instructions
- Respect, demographic questions, feedback
Wug-test
- Practicum overview
Lexicon study Experimental design
- Wug-test
- Artificial grammars
- Others kinds of
experiments Building and running an experiment Where to go next
9 / 20
- Creating items
- Choosing a task
- Recruiting participants
- Randomization
- Fillers/distractors
- Instructions
- Make them short. People don’t read them anyway.
- Give them in the same language as the experiment.
- Maybe the ideal experiment doesn’t have instructions at all...
- One or two practice items.
- Respect, demographic questions, feedback
Wug-test
- Practicum overview
Lexicon study Experimental design
- Wug-test
- Artificial grammars
- Others kinds of
experiments Building and running an experiment Where to go next
9 / 20
- Creating items
- Choosing a task
- Recruiting participants
- Randomization
- Fillers/distractors
- Instructions
- Respect, demographic questions, feedback
- Consult your institution’s policy on experiments with humans.
- Never lie to your participants.
- Don’t give participants any motivation to lie.
- Suspicious participants → run, then throw the data out.
- Ask for feedback at the end.
Artificial grammars
- Practicum overview
Lexicon study Experimental design
- Wug-test
- Artificial grammars
- Others kinds of
experiments Building and running an experiment Where to go next
10 / 20
- Reminder: artificial grammar inspired by English voicing
- Task: training and testing
- Presentation of novel concepts
Artificial grammars
- Practicum overview
Lexicon study Experimental design
- Wug-test
- Artificial grammars
- Others kinds of
experiments Building and running an experiment Where to go next
10 / 20
- Reminder: artificial grammar inspired by English voicing
monosyllabic training iambic training Training 10 stop-final monos 10 stop-final iambs "mip "mibni t@"gep t@"gebni "stut "studni g@"Sut g@"Sudni 5 sonorant-finals: "muN-ni, n@"Ãol-ni Testing 10 stop-final monos 10 stop-final monos "gaIp "gaIp "klet "klet 10 stop-final iambs 10 stop-final iambs f@"Ùop f@"Ùop b@"git b@"git 10 sonorant-finals: "pler, Z@"taIm
- Task: training and testing
- Presentation of novel concepts
Artificial grammars
- Practicum overview
Lexicon study Experimental design
- Wug-test
- Artificial grammars
- Others kinds of
experiments Building and running an experiment Where to go next
10 / 20
- Reminder: artificial grammar inspired by English voicing
- Task: training and testing
- Train: give the participant examples of the pattern to learn
- Test: did the participant apply the pattern to items they
haven’t seen before?
- Holdout condition: did the participant apply the pattern to a
kind of item they haven’t seen before?
- Compare two groups of participants
- Presentation of novel concepts
Artificial grammars
- Practicum overview
Lexicon study Experimental design
- Wug-test
- Artificial grammars
- Others kinds of
experiments Building and running an experiment Where to go next
10 / 20
- Reminder: artificial grammar inspired by English voicing
- Task: training and testing
- Presentation of novel concepts
- Easiest concepts to present: concrete objects
- Affixes: plural, dual vs. plurals, feminine(?)
- How do you present a novel adjective? Novel verb?
- Affixes: comparative? perfective?...
Others kinds of experiments
- Practicum overview
Lexicon study Experimental design
- Wug-test
- Artificial grammars
- Others kinds of
experiments Building and running an experiment Where to go next
11 / 20
- Tasks
- Lexical decision
- Confusability
- Morphological knowledge
- etc. etc.
- Make your own (you are allowed!)
- Modalities
- On paper
- Keyboard/mouse
- Microphone
- Button box
- Eye tracking, ERP
, fMRI, ultrasound...
Building and running an experiment
- Practicum overview
Lexicon study Experimental design Building and running an experiment
- Audio stimuli
- Online experiments
- Paper experiments
- Results
- Lexicon vs.
experiment Where to go next
12 / 20
Audio stimuli
- Practicum overview
Lexicon study Experimental design Building and running an experiment
- Audio stimuli
- Online experiments
- Paper experiments
- Results
- Lexicon vs.
experiment Where to go next
13 / 20
- Preparing materials
- Recording
- Chopping up — Praat scripting
- Converting to mp3 (http://www.macroplant.com/adapter/)
Audio stimuli
- Practicum overview
Lexicon study Experimental design Building and running an experiment
- Audio stimuli
- Online experiments
- Paper experiments
- Results
- Lexicon vs.
experiment Where to go next
13 / 20
- Preparing materials
- Presentation of materials to the consultant
- Randomizations
- Repetitions
- Recording
- Chopping up — Praat scripting
- Converting to mp3 (http://www.macroplant.com/adapter/)
Audio stimuli
- Practicum overview
Lexicon study Experimental design Building and running an experiment
- Audio stimuli
- Online experiments
- Paper experiments
- Results
- Lexicon vs.
experiment Where to go next
13 / 20
- Preparing materials
- Recording
- Sound booth ≻ quiet room ≻ outdoors
- Decent equipment
- Chopping up — Praat scripting
- Converting to mp3 (http://www.macroplant.com/adapter/)
Audio stimuli
- Practicum overview
Lexicon study Experimental design Building and running an experiment
- Audio stimuli
- Online experiments
- Paper experiments
- Results
- Lexicon vs.
experiment Where to go next
13 / 20
- Preparing materials
- Recording
- Chopping up — Praat scripting
- Mark pauses, add labels, save labeled intervals
- Find Praat scripts and help online
- Converting to mp3 (http://www.macroplant.com/adapter/)
Online experiments
- Practicum overview
Lexicon study Experimental design Building and running an experiment
- Audio stimuli
- Online experiments
- Paper experiments
- Results
- Lexicon vs.
experiment Where to go next
14 / 20
- Hard to set up the first time, easy to run every time
- Provides access to remote participants
- Programs that can help:
- Experigen (Becker & Levine 2010)
- Ibex Farm
- Webexp
Paper experiments
- Practicum overview
Lexicon study Experimental design Building and running an experiment
- Audio stimuli
- Online experiments
- Paper experiments
- Results
- Lexicon vs.
experiment Where to go next
15 / 20
Slower and more labor-intensive, but lower barrier to entry
- Just prepare a document, print it out, give to participant.
- Items still have to randomized (manually?)
- More advanced option: a script that creates a L
A
T EX file.
- Responses need to be typed up.
Results
- Practicum overview
Lexicon study Experimental design Building and running an experiment
- Audio stimuli
- Online experiments
- Paper experiments
- Results
- Lexicon vs.
experiment Where to go next
16 / 20
- Descriptive statistics
- Inferential statistics
- Portuguese nonce words again
Results
- Practicum overview
Lexicon study Experimental design Building and running an experiment
- Audio stimuli
- Online experiments
- Paper experiments
- Results
- Lexicon vs.
experiment Where to go next
16 / 20
- Descriptive statistics
- For the population, not for individual participants
- Inferential statistics
- Portuguese nonce words again
Results
- Practicum overview
Lexicon study Experimental design Building and running an experiment
- Audio stimuli
- Online experiments
- Paper experiments
- Results
- Lexicon vs.
experiment Where to go next
16 / 20
- Descriptive statistics
- Inferential statistics
- The regression is your friend.
- ANOVA: an older, more restricted kind of regression. Not
necessary anymore, still used by many people.
- Florian Jaeger’s slides, lab blog.
- Portuguese nonce words again
Results
- Practicum overview
Lexicon study Experimental design Building and running an experiment
- Audio stimuli
- Online experiments
- Paper experiments
- Results
- Lexicon vs.
experiment Where to go next
16 / 20
- Descriptive statistics
- Inferential statistics
- Portuguese nonce words again
(Linear) regression for the experimental results
β
SE(β) t p-value (Intercept) 4.47 0.10 42.91 mono vs. trochee 0.35 0.09 3.80
<.0005
mono & trochee vs. iamb 0.55 0.11 4.77
<.0001
lax 0.43 0.09 4.63
<.0001
low
−0.40
0.26
−1.53 >.1
mono vs. trochee:low 0.72 0.31 2.35
<.05
Lexicon vs. experiment
- Practicum overview
Lexicon study Experimental design Building and running an experiment
- Audio stimuli
- Online experiments
- Paper experiments
- Results
- Lexicon vs.
experiment Where to go next
17 / 20
- Predictions from the lexicon
- Correlation
- Model comparison
Lexicon vs. experiment
- Practicum overview
Lexicon study Experimental design Building and running an experiment
- Audio stimuli
- Online experiments
- Paper experiments
- Results
- Lexicon vs.
experiment Where to go next
17 / 20
- Predictions from the lexicon
- Correlation
1 2 3 4 5 6 7 3 4 5 6
lexicon model predictions mean participant response mono iamb trochee
- Model comparison
Lexicon vs. experiment
- Practicum overview
Lexicon study Experimental design Building and running an experiment
- Audio stimuli
- Online experiments
- Paper experiments
- Results
- Lexicon vs.
experiment Where to go next
17 / 20
- Predictions from the lexicon
- Correlation
- Model comparison
- Which factors improve the fit?
- Measured with a χ2 test
Where to go next
- Practicum overview
Lexicon study Experimental design Building and running an experiment Where to go next
- Resources
- References
18 / 20
Resources
- Practicum overview
Lexicon study Experimental design Building and running an experiment Where to go next
- Resources
- References
19 / 20
- Skills to learn
- Working with text files
- Working with sound files
- Experimental design
- Descriptive statistics
- Inferential statistics
- Automation: Praat scripting, some other scripting language
(Perl, Python, Javascript, etc.)
- Theory: John McCarthy’s books (Doing OT, reader), Bybee
- Quantitative methods: Johnson, Baayen, Gelman & Hill, Jaeger’s
slides, journal articles
- Statistics class in the psychology department
- Internship with a cognitive psychologist or experimental linguist
(design, run experiments, stats)
References
- Practicum overview
Lexicon study Experimental design Building and running an experiment Where to go next
- Resources
- References