Issues and Solutions in Fitting, Sample Data and Simple Models - PowerPoint PPT Presentation

Issues and Solutions in Regression Modeling Florian Jaeger, Victor Kuperman Issues and Solutions in Fitting, Sample Data and Simple Models Evaluating, and Interpreting Regression Building an interpretable model Models Model Evaluation Reporting the model Florian Jaeger, Victor Kuperman Discussion March 24, 2009

Issues and Hypothesis testing in psycholinguistic Solutions in Regression research Modeling Florian Jaeger, Victor Kuperman Sample Data and ◮ Typically, we make predictions not just about the Simple Models Building an existence, but also the direction of effects. interpretable model ◮ Sometimes, we’re also interested in effect shapes Model Evaluation (non-linearities, etc.) Reporting the model ◮ Unlike in ANOVA, regression analyses reliably test Discussion hypotheses about effect direction and shape without requiring post-hoc analyses if (a) the predictors in the model are coded appropriately and (b) the model can be trusted . ◮ Today: Provide an overview of (a) and (b).

Issues and Overview Solutions in Regression Modeling ◮ Introduce sample data and simple models Florian Jaeger, Victor Kuperman ◮ Towards a model with interpretable coefficients: ◮ outlier removal Sample Data and Simple Models ◮ transformation Building an ◮ coding, centering, . . . interpretable model ◮ collinearity Model Evaluation ◮ Model evaluation: Reporting the ◮ fitted vs. observed values model ◮ model validation Discussion ◮ investigation of residuals ◮ case influence, outliers ◮ Model comparison ◮ Reporting the model: ◮ comparing effect sizes ◮ back-transformation of predictors ◮ visualization

Sample Data and Simple Models Issues and Building an interpretable model Solutions in Regression Data exploration Modeling Transformation Florian Jaeger, Coding Victor Kuperman Centering Interactions and modeling of non-linearities Sample Data and Simple Models Collinearity Building an What is collinearity? interpretable Detecting collinearity model Dealing with collinearity Model Evaluation Model Evaluation Reporting the Beware overfitting model Detect overfitting: Validation Discussion Goodness-of-fit Aside: Model Comparison Reporting the model Describing Predictors What to report Back-transforming coefficients Comparing effect sizes Visualizing effects Interpreting and reporting interactions Discussion

Issues and Data 1: Lexical decision RTs Solutions in Regression Modeling Florian Jaeger, Victor Kuperman ◮ Outcome: log lexical decision latency RT Sample Data and ◮ Inputs: Simple Models Building an ◮ factors Subject (21 levels) and Word (79 levels), interpretable ◮ factor NativeLanguage ( English and Other ) model ◮ continuous predictors Frequency (log word frequency), Model Evaluation and Trial (rank in the experimental list). Reporting the model Discussion Subject RT Trial NativeLanguage Word Frequency 1 A1 6.340359 23 English owl 4.859812 2 A1 6.308098 27 English mole 4.605170 3 A1 6.349139 29 English cherry 4.997212 4 A1 6.186209 30 English pear 4.727388 5 A1 6.025866 32 English dog 7.667626 6 A1 6.180017 33 English blackberry 4.060443

Issues and Linear model of RTs Solutions in Regression Modeling > lin.lmer = lmer(RT ~ NativeLanguage + + Frequency + Trial + Florian Jaeger, + (1 | Word) + (1 | Subject), Victor Kuperman + data = lexdec) <...> Sample Data and Random effects: Simple Models Groups Name Variance Std.Dev. Word (Intercept) 0.0029448 0.054266 Building an Subject (Intercept) 0.0184082 0.135677 interpretable Residual 0.0297268 0.172415 model Number of obs: 1659, groups: Word, 79; Subject, 21 Model Evaluation Fixed effects: Reporting the Estimate Std. Error t value model (Intercept) 6.548e+00 4.963e-02 131.94 NativeLanguageOther 1.555e-01 6.043e-02 2.57 Discussion Frequency -4.290e-02 5.829e-03 -7.36 Trial -2.418e-04 9.122e-05 -2.65 <...> ◮ estimates for random effects of Subject and Word and for the residual error of the model: standard deviation and variance. ◮ estimates for regression coefficients, standard errors → t-values ◮ Effect significant if ± 2*SE does not include zero (if t -value of ± 2).

Issues and Linear model of RTs (cnt’d) Solutions in Regression Modeling Florian Jaeger, Victor Kuperman ◮ t -value anti-conservative Sample Data and Simple Models → MCMC-sampling of coefficients to obtain non Building an interpretable anti-conservative estimates model Model Evaluation > pvals.fnc(lin.lmer, nsim = 10000) Reporting the $fixed model Estimate MCMCmean HPD95lower HPD95upper pMCMC Pr(>|t|) (Intercept) 6.5476 6.5482 6.4653 6.6325 0.0001 0.0000 Discussion NativeLanguageOther 0.1555 0.1551 0.0580 0.2496 0.0012 0.0001 Frequency -0.0429 -0.0429 -0.0542 -0.0323 0.0001 0.0000 Trial -0.0002 -0.0002 -0.0004 -0.0001 0.0068 0.0109 $random Groups Name Std.Dev. MCMCmedian MCMCmean HPD95lower HPD95upper 1 Word (Intercept) 0.0564 0.0495 0.0497 0.0384 0.0619 2 Subject (Intercept) 0.1410 0.1070 0.1083 0.0832 0.1379 3 Residual 0.1792 0.1737 0.1737 0.1678 0.1799

Issues and Data 2: Lexical decision response Solutions in Regression ◮ Outcome: Correct or incorrect response ( Correct ) Modeling ◮ Inputs: same as in linear model Florian Jaeger, Victor Kuperman > lmer(Correct == "correct" ~ NativeLanguage + Sample Data and + Frequency + Trial + Simple Models + (1 | Subject) + (1 | Word), Building an + data = lexdec, family = "binomial") interpretable model Random effects: Groups Name Variance Std.Dev. Model Evaluation Word (Intercept) 1.01820 1.00906 Reporting the Subject (Intercept) 0.63976 0.79985 model Number of obs: 1659, groups: Word, 79; Subject, 21 Discussion Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.746e+00 8.206e-01 -2.128 0.033344 * NativeLanguageOther -5.726e-01 4.639e-01 1.234 0.217104 Frequency 5.600e-01 1.570e-01 -3.567 0.000361 *** Trial 4.443e-06 2.965e-03 0.001 0.998804 ◮ estimates for random effects of Subject and Word (no residuals). ◮ estimates for regression coefficients, standard errors → Z- and p-values

Issues and Interpretation of coefficients Solutions in Regression Modeling Florian Jaeger, Victor Kuperman ◮ In theory , directionality and shape of effects can be Sample Data and Simple Models tested and immediately interpreted. Building an ◮ e.g. logit model interpretable model Model Evaluation Fixed effects: Reporting the Estimate Std. Error z value Pr(>|z|) model (Intercept) -1.746e+00 8.206e-01 -2.128 0.033344 * Discussion NativeLanguageOther 5.726e-01 4.639e-01 1.234 0.217104 Frequency -5.600e-01 1.570e-01 -3.567 0.000361 *** Trial -5.725e-06 2.965e-03 -0.002 0.998460 ◮ . . . but can these coefficient estimates be trusted?

Sample Data and Simple Models Issues and Building an interpretable model Solutions in Regression Data exploration Modeling Transformation Florian Jaeger, Coding Victor Kuperman Centering Interactions and modeling of non-linearities Sample Data and Simple Models Collinearity Building an What is collinearity? interpretable Detecting collinearity model Dealing with collinearity Data exploration Transformation Model Evaluation Coding Centering Beware overfitting Interactions and modeling Detect overfitting: Validation of non-linearities Collinearity Goodness-of-fit What is collinearity? Aside: Model Comparison Detecting collinearity Dealing with collinearity Reporting the model Model Evaluation Describing Predictors Reporting the What to report model Back-transforming coefficients Discussion Comparing effect sizes Visualizing effects Interpreting and reporting interactions Discussion

Issues and Modeling schema Solutions in Regression Modeling Florian Jaeger, Victor Kuperman Sample Data and Simple Models Building an interpretable model Data exploration Transformation Coding Centering Interactions and modeling of non-linearities Collinearity What is collinearity? Detecting collinearity Dealing with collinearity Model Evaluation Reporting the model Discussion

Issues and Data exploration Solutions in Regression Modeling Florian Jaeger, Victor Kuperman Sample Data and Simple Models Building an interpretable model Data exploration Transformation Coding Centering Interactions and modeling of non-linearities Collinearity What is collinearity? Detecting collinearity Dealing with collinearity Model Evaluation Reporting the model Discussion

Issues and Data exploration Solutions in Regression Modeling Florian Jaeger, Victor Kuperman ◮ Select and understand input variables and outcome Sample Data and based on a-priori theoretical consideration Simple Models ◮ How many parameters does your data afford Building an interpretable ( � overfitting )? model Data exploration ◮ Data exploration: Before fitting the model, explore Transformation Coding inputs and outputs Centering Interactions and modeling ◮ Outliers due to missing data or measurement error (e.g. of non-linearities Collinearity RTs in SPR < 80msecs). What is collinearity? ◮ NB: postpone distribution-based outlier exclusion until Detecting collinearity Dealing with collinearity after transformations ) Model Evaluation ◮ Skewness in distribution can affect the accuracy of Reporting the model’s estimates ( � transformations ). model Discussion

Issues and Solutions in Fitting, Sample Data and Simple Models - PowerPoint PPT Presentation

Issues and Solutions in Regression Modeling Florian Jaeger, Victor Kuperman Issues and Solutions in Fitting, Sample Data and Simple Models Evaluating, and Interpreting Regression Building an interpretable model Models Model Evaluation

Track fitting, vertex fitting and Track fitting, vertex fitting and Track fitting, vertex fitting

Week 2 Video 5 Cross-Validation and Over-Fitting Over-Fitting Ive mentioned over-fitting a

Lecture 11 Fitting ARIMA Models 10/10/2018 1 Model Fitting Fitting ARIMA For an

Functions and Data Fitting COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning

Fitting a Line, Residuals, and Correlation October 28, 2019 October 28, 2019 1 / 36 Fitting a

Fitting a Line, Residuals, and Correlation August 27, 2019 August 27, 2019 1 / 54 Fitting a

Least Squares and Data Fitting Data fitting How do we best fit a set of data points? Linear

Over fitting distribution functions over Bayesian Regression / " ' i diggllloise dist

Fitting high resolution structures into low resolution EM maps Michael Rossmann Purdue

Unit 1: Data Fitting Motivation Data fitting: Construct a continuous function that represents

Mechanical Fitting Failures Reporting and Data Analysis - 1 - MFFR Reporting 191.12

Lecture 19 Fitting CAR and SAR Models Colin Rundel 03/29/2017 1 Fitting areal models 2 CAR

Lecture 3 bis Fitting and the Hough transform Fitting: Motivation 9300 Harris Corners Pkwy,

Lecture 18 Fitting CAR and SAR Models Colin Rundel 11/07/2018 1 Fitting areal models Revised

Fitting: Deformable contours Goal: move from array of pixel values (or Monday, Feb 21 filter

Fitting Agent Fitting Agent- -Based Models to Based Models to Historical Networks Historical

Regression and Induction on Situations Adrian Pearce 23 June 2009 includes slides by Ray Reiter

Planning and Optimization B3. General Regression Malte Helmert and Thomas Keller Universit at

Projection Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU)

LINEAR REGRESSION Sylvain Calinon Robot Learning & Interaction Group Idiap Research

What is ecological inference ( EI )? eiPack : Tools for R C Ecological Inference and Goal:

Machine Learning and Data Mining Decision Trees Kalev Kask Decision trees Functional form

Boolean Network Modeling Bioinformatics: Sequence Analysis COMP 571 - Spring 2015 Luay Nakhleh,

Regional Update - Programs, Performance, and Future Plans February 9, 2010 Agenda Agenda

Issues and Solutions in Fitting, Sample Data and Simple Models - PowerPoint PPT Presentation

Issues and Solutions in Regression Modeling Florian Jaeger, Victor Kuperman Issues and Solutions in Fitting, Sample Data and Simple Models Evaluating, and Interpreting Regression Building an interpretable model Models Model Evaluation

Track fitting, vertex fitting and Track fitting, vertex fitting and Track fitting, vertex fitting

Week 2 Video 5 Cross-Validation and Over-Fitting Over-Fitting Ive mentioned over-fitting a

Lecture 11 Fitting ARIMA Models 10/10/2018 1 Model Fitting Fitting ARIMA For an

Functions and Data Fitting COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning

Fitting a Line, Residuals, and Correlation October 28, 2019 October 28, 2019 1 / 36 Fitting a

Fitting a Line, Residuals, and Correlation August 27, 2019 August 27, 2019 1 / 54 Fitting a

Least Squares and Data Fitting Data fitting How do we best fit a set of data points? Linear

Over fitting distribution functions over Bayesian Regression / &quot; ' i diggllloise dist

Fitting high resolution structures into low resolution EM maps Michael Rossmann Purdue

Unit 1: Data Fitting Motivation Data fitting: Construct a continuous function that represents

Mechanical Fitting Failures Reporting and Data Analysis - 1 - MFFR Reporting 191.12

Lecture 19 Fitting CAR and SAR Models Colin Rundel 03/29/2017 1 Fitting areal models 2 CAR

Lecture 3 bis Fitting and the Hough transform Fitting: Motivation 9300 Harris Corners Pkwy,

Lecture 18 Fitting CAR and SAR Models Colin Rundel 11/07/2018 1 Fitting areal models Revised

Fitting: Deformable contours Goal: move from array of pixel values (or Monday, Feb 21 filter

Fitting Agent Fitting Agent- -Based Models to Based Models to Historical Networks Historical

Regression and Induction on Situations Adrian Pearce 23 June 2009 includes slides by Ray Reiter

Planning and Optimization B3. General Regression Malte Helmert and Thomas Keller Universit at

Projection Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU)

LINEAR REGRESSION Sylvain Calinon Robot Learning &amp; Interaction Group Idiap Research

What is ecological inference ( EI )? eiPack : Tools for R C Ecological Inference and Goal:

Machine Learning and Data Mining Decision Trees Kalev Kask Decision trees Functional form

Boolean Network Modeling Bioinformatics: Sequence Analysis COMP 571 - Spring 2015 Luay Nakhleh,

Regional Update - Programs, Performance, and Future Plans February 9, 2010 Agenda Agenda

Over fitting distribution functions over Bayesian Regression / " ' i diggllloise dist

LINEAR REGRESSION Sylvain Calinon Robot Learning & Interaction Group Idiap Research