Everything will be online Phonatics blackboard site; if you need - PowerPoint PPT Presentation

Everything will be online • Phonatics blackboard site; if you need access email me. • These slides • Two example datasets • Papers are available as well: Continous measures: http://dx.doi.org/10.1016/j.jml.2007.12.005 Categorical measures: http://dx.doi.org/10.1016/j.jml.2007.11.007 Comprehensive review in Baayen’s book: http://www.ualberta.ca/~baayen/publications/baayenCUPstats.pdf

Parametric Statistical Analyses • Dependent variable: What you’re measuring – RT, accuracy, VOT – Can be categorical (accuracy) or continuous (RT) • Factors: Things that influence this dependent variable – Frequency of word, person who said it, lexical identity of the word – Can be categorical (place of articulation) or continuous (frequency) • (Statistical) Model: Uses factors to make predictions about the dependent variable. – Focus today: linear models – Model is a linear sum of predictors – Note that predictors can be single factors (frequency) as well as interactions (frequency * grammatical category)

Why ANOVA? • Pairwise comparisons introduce errors. Lots of factors = lots of conditions = lots of comparisons = lots of opportunities for false positives. – ANOVA avoids this by testing a single measure— variance —across groups. • Factors are not all fixed. – Sometimes we examine the influence of a factor that has a finite number of possible levels, all of which we sample (e.g., there are 3 places of articulation for English oral stops; our experiment includes b,d,g). – In other cases, we have a random sample from among an infinite set of possible levels (e.g., we run 10 NU undergraduates; the space of possible undergraduate humans is infinite). – ANOVA allows us to treat one factor as random.

Issues with ANOVA • Dependent variables are not always continuous. Converting or transforming an inherently categorical measure like accuracy into a semi-continuous measure (e.g., % correct) introduces errors (even with rau transform). • Factors are not always categorical. ANOVA functions best with categorical factors; some limited success with techniques such an ANCOVA. • Sometimes you have multiple random factors. Both subjects and items can be random; no way to incorporate both into a single analysis.

Mixed Effects Regression • Dependent variables are not always continuous. – Output of statistical model can either be directly related to continuous dependent variable, or linked by a logistic function [= probability of one categorical output]. – Output can be RT or Pr(correct). • Factors are not always categorical. – Mixed effects allows for both continuous as well as categorical predictors. • Sometimes you have multiple random factors. – Random factors are just additional terms in the regression; can incorporate more than one.

Prerequisites • A current version of R • languageR libraries – In R, go to the Packages menu – Find language R; install it, and it will also install all the other libraries it needs. • Some data (factors + observations along a dependent measure)

Formatting Your Data • Each line in the file should be a SINGLE observation. – Only ONE column should have data in it. – This should be an observation, not an average or total. – Save as tab delimited txt file • Example: VOT data. Subject Word POA Voicing Freq VOT S001 bat labial voiced 10.5 13.2 Note: Categorical variables are text; don’t use 001 for Subject 1 • Example: Accuracy data. Subject Lang SNR Trial Accuracy S001 Eng 3 1 Correct Note: Put each trial, correct vs. incorrect!

Data set 1: VOT • Correct productions from a tongue twister experiment. • Effect of voicing, place; no interaction

Analysis in R • Read in data > dat = read.delim("vot.txt") • Build your model > dat.lmer = lmer(VOT~Voicing*POA+(1|Subject),data=dat) • lmer = “Linear mixed effects regression” • Model is specified by referring to column headers ( = factor names) – VOT ~ : Predict VOT – Voicing * POA: Use a full factorial model combining Voicing and place – +(1|Subject): Allow each subject to have their own ‘baseline’ VOT (= intercept) – data = dat: Variable where data is stored

Analysis in R • Results > dat.lmer Linear mixed model fit by REML Reminder of Formula: VOT ~ Voicing * POA + (1 | Subject) what model is Data: dat AIC BIC logLik deviance REMLdev Characterization of how well it fits data -4888 -4849 2452 -4969 -4904 Random effects: Contribution of Groups Name Variance Std.Dev. random effects, plus Subject (Intercept) 0.00004044 0.0063593 overall model, has Residual 0.00025566 0.0159894 some degree of Number of obs: 918, groups: Subject, 10 intrinsic error (normally distributed)

Estimate (with error) of direction and degree of factor’s influence on dependent measure. t ≈ student’s T Fixed effects: Estimate Std. Error t value (Intercept) 0.019582 0.002753 7.114 Voicingvoiceless 0.025920 0.002384 10.872 POAbilabial -0.004416 0.002091 -2.112 POAvelar 0.009327 0.002442 3.819 Voicingvoiceless:POAbilabial 0.002404 0.002773 0.867 Voicingvoiceless:POAvelar 0.004845 0.003220 1.505 Categorical variables are ‘dummy’ coded as 1/0. R sort the levels alphabetically and assigns 0 to the first one. If there is more than one level, split into two variables.

Side note: Random effects • Each level of random effect gets its own intercept > ranef(dat.lmer) $Subject (Intercept) s1 -5.989105e-03 s10 -7.125361e-03 s2 3.499354e-03 s3 4.140038e-03 s4 6.205415e-05 s5 -2.179526e-03 s6 -1.000678e-02 s7 6.203679e-03 s8 5.616277e-03 s9 5.779373e-03 • “random” component--assume there is an independent source of error on these factors – Other individuals you might test would also have different intercepts; they are randomly distributed around current sample of participants

Estimate (with error) of direction and degree of factor’s influence on dependent measure. t ≈ student’s T Fixed effects: Estimate Std. Error t value (Intercept) 0.019582 0.002753 7.114 Voicingvoiceless 0.025920 0.002384 10.872 Positive coefficient estimate: Relative to voiced (reference level), voiceless consonants have longer VOTs

Estimate (with error) of direction and degree of factor’s influence on dependent measure. t ≈ student’s T Fixed effects: Estimate Std. Error t value (Intercept) 0.019582 0.002753 7.114 POAbilabial -0.004416 0.002091 -2.112 POAvelar 0.009327 0.002442 3.819 What is reference level for POA?

Estimate (with error) of direction and degree of factor’s influence on dependent measure. t ≈ student’s T Fixed effects: Estimate Std. Error t value (Intercept) 0.019582 0.002753 7.114 POAbilabial -0.004416 0.002091 -2.112 POAvelar 0.009327 0.002442 3.819 What is reference level for POA? Coronal. What does positive coefficient for velar and negative coefficient for bilabial mean?

Estimate (with error) of direction and degree of factor’s influence on dependent measure. t ≈ student’s T Fixed effects: Estimate Std. Error t value (Intercept) 0.019582 0.002753 7.114 POAbilabial -0.004416 0.002091 -2.112 POAvelar 0.009327 0.002442 3.819 What is reference level for POA? Coronal. What does positive coefficient for velar and negative coefficient for bilabial mean? Bilabials have shorter VOTs than coronals; velars have longer VOTs than coronals.

Correlation of Fixed Effects: (Intr) Vcngvc POAblb POAvlr Vcngvclss:POAb Voicngvclss -0.496 POAbilabial -0.580 0.648 POAvelar -0.484 0.555 0.640 Vcngvclss:POAb 0.429 -0.857 -0.744 -0.478 Vcngvclss:POAv 0.360 -0.729 -0.482 -0.750 0.630 Last bit of output: Table summarizing correlation of predictors. If they are highly correlated, model might not be accurately characterizing the data.

Significance • Method 1: Significance of a factor = coefficient of the factor is not equal to 0. • We have t-values but not p-values, what’s up with that? • Calculating degrees of freedom for these t-values is extremely complicated. • Baayen’s approach: Use Markov Chain Monte Carlo (MCMC) sampling to estimate distribution of the parameters of the statistical model. – MCMC is just a means of randomly sampling from probability distributions. – Here, we randomly sample from the probability distribution of model estimates.

Significance • Command > pvals.fnc(dat.lmer) • Will generate a set of figures giving you the distribution of each parameter. • Critical output: Proportion of MCMC samples where estimate > or < 0. – Voicing: < .0001 – POA bilabial: .0368 – POA velar: .0004 – Voicing * Bilabial: .3874 – Voicing * Velar: .1384

Significance • Method 2: Model comparison • Construct a simple model, then a more complex model. • If the more complex model has a significantly better fit to the data, the additional factor makes a significant contribution.

Everything will be online Phonatics blackboard site; if you need - PowerPoint PPT Presentation

Everything will be online Phonatics blackboard site; if you need access email me. These slides Two example datasets Papers are available as well: Continous measures: http://dx.doi.org/10.1016/j.jml.2007.12.005 Categorical

The Internet of Everything Pete Lancia Sr. Dir., Marketing 1 The Internet of Everything The

Enabling the Internet of Everything with 3G The Internet of Everything The Next Era of

Ti e world exploded into a whirling network of kinships, where everything pointed to

Online Learning and Online Investing Jia Mao February 20, 2006 Jia Mao () Online Learning and

ONLINE ADVERTISING What is SIBC online? SIBC Online is a leading online news source for the

Online Learning Lorenzo Rosasco MIT, 9.520 L. Rosasco Online Learning About this class Goal

SECURE-ONLINE (ZEKER-ONLINE) Quality mark for online cloud services Tom Vreeburg Boardmember

How DGD.online helps prepare DG documentation easily CIFFA Webinar, June 25, 2019 DGD.online

Getting Online Getting Online Domain Names Email Google My Business Listing

Online Identity & Social Media by: Nicole Santarsiero What is Online Identity? -Online

2008 Online Awards Awards Banquet Better Newspaper Online Contest 2008 Best Online Advertising

2013 IRS Online Services Update IRS Online Services Update Jim Weaver Director, Product

ONLINE PROCESS SIMULATION ONLINE PROCESS SIMULATION ONLINE, REAL-TIME AND PREDICTIVE PROCESS DATA

Benefits of Online Reporting WHY YOU SHOULD BE REPORTING ONLINE Implementation Benefits of Online

Computing Online Safety Computing | Year 1 | Online Safety | Staying SMART Online| Lesson 3 Aim

CTs A brief talk on A brief talk on CTs everything! everything! Dawn Banghart, CHP

The unpublished Scots phonological material from the Linguistic Survey of Scotland Warren

Handout Download at http://thejourneyler.org/handout.pdf Friday, August 17, 18 Not RGH /*R

Disclosures For the purposes of this presentation: I am a paid employee of Mount Sinai

Sensori-motor constraints and the organization of sound patterns Lucie Mnard Laboratoire de

- 2 - literally means words of Chinese characters. In fact, the necessary requirement being

Aspects of Pontic Greek Spoken in Georgia is very timely, because it provides a rather detailed

Clima Climate, W Water ter, and Ecosy and Ecosystems: tems: A Futur A Future of of Sur

Using Geospatial Data to Extend Site Specific N Analysis to the Watershed Scale Research

Everything will be online Phonatics blackboard site; if you need - PowerPoint PPT Presentation

Everything will be online Phonatics blackboard site; if you need access email me. These slides Two example datasets Papers are available as well: Continous measures: http://dx.doi.org/10.1016/j.jml.2007.12.005 Categorical

The Internet of Everything Pete Lancia Sr. Dir., Marketing 1 The Internet of Everything The

Enabling the Internet of Everything with 3G The Internet of Everything The Next Era of

Ti e world exploded into a whirling network of kinships, where everything pointed to

Online Learning and Online Investing Jia Mao February 20, 2006 Jia Mao () Online Learning and

ONLINE ADVERTISING What is SIBC online? SIBC Online is a leading online news source for the

Online Learning Lorenzo Rosasco MIT, 9.520 L. Rosasco Online Learning About this class Goal

SECURE-ONLINE (ZEKER-ONLINE) Quality mark for online cloud services Tom Vreeburg Boardmember

How DGD.online helps prepare DG documentation easily CIFFA Webinar, June 25, 2019 DGD.online

Getting Online Getting Online Domain Names Email Google My Business Listing

Online Identity &amp; Social Media by: Nicole Santarsiero What is Online Identity? -Online

2008 Online Awards Awards Banquet Better Newspaper Online Contest 2008 Best Online Advertising

2013 IRS Online Services Update IRS Online Services Update Jim Weaver Director, Product

ONLINE PROCESS SIMULATION ONLINE PROCESS SIMULATION ONLINE, REAL-TIME AND PREDICTIVE PROCESS DATA

Benefits of Online Reporting WHY YOU SHOULD BE REPORTING ONLINE Implementation Benefits of Online

Computing Online Safety Computing | Year 1 | Online Safety | Staying SMART Online| Lesson 3 Aim

CTs A brief talk on A brief talk on CTs everything! everything! Dawn Banghart, CHP

The unpublished Scots phonological material from the Linguistic Survey of Scotland Warren

Handout Download at http://thejourneyler.org/handout.pdf Friday, August 17, 18 Not RGH /*R

Disclosures For the purposes of this presentation: I am a paid employee of Mount Sinai

Sensori-motor constraints and the organization of sound patterns Lucie Mnard Laboratoire de

- 2 - literally means words of Chinese characters. In fact, the necessary requirement being

Aspects of Pontic Greek Spoken in Georgia is very timely, because it provides a rather detailed

Clima Climate, W Water ter, and Ecosy and Ecosystems: tems: A Futur A Future of of Sur

Using Geospatial Data to Extend Site Specific N Analysis to the Watershed Scale Research

Online Identity & Social Media by: Nicole Santarsiero What is Online Identity? -Online