Experiment Design and Statistical Data Analysis Dr. Pradipta Biswas

Experiment Design and Statistical Data Analysis Dr. Pradipta Biswas �� !��"��#$$%��&��&'( ��"��"))��!��&��&��&��&'()*��#$$ Structure for today • Experiment Design • Data Preparation Introduction • Test Selection • Your projects • Data analysis Discussion • T!Test • ANOVA Details on tests 2 Engineering Design Centre �

Histogram 3 Engineering Design Centre Central Tendency 4 Engineering Design Centre �

Box plot 5 Engineering Design Centre Normal Distribution & Standard Deviation 6 Engineering Design Centre �

How to test •Sampling: Select a set of participants •Method: Design a study to collect data •Material: Get instruments •Procedure: Collect data from participants •Result: Analyze data 7 Engineering Design Centre Sampling •Size –No straight forward answer !! –Can be estimated statistically �� –Bigger the better �� • more representative of population –Often limited by availability •Quality –Random sampling –Group based sampling –Purpose based sampling 8 Engineering Design Centre �

Variables •Variables are things that change •The independent variable is the variable that is purposely changed. It is the manipulated variable. •The dependent variable changes in response to the independent variable. It is the responding variable. 9 Engineering Design Centre Variables Constant Variables •Factors that are kept the same and not allowed to change. •It is important to control all but one variable at a time to be able to interpret data 10 Engineering Design Centre �

Hypothesis •Your best thinking about how the change you make might affect another factor. •Tentative or trial solution to the question. • An if >>>> then >>>> statement. •Should be expressed in measurable terms 11 Engineering Design Centre Experiment •Random assignment –Factorial design: More than one IV –Parametric design: IV has more than two levels •Matched pair •Repeated measure 12 Engineering Design Centre �

Data Screening • Skewing –In opposite direction • Unequal Variance –Equal number of samples: σ 2 max / σ 2 min < 4 –Unequal number of samples: σ 2 max / σ 2 min < 2 • Random Error • Missing Values • Data Transformation 13 Engineering Design Centre Normality Check We should check for normality using: • assumptions about population • histograms for each group • normal quantile plot for each group With such small data sets, there really isn’t a really good way to check normality from data, but we make the common assumption that physical measurements of people tend to be normally distributed. 14 Engineering Design Centre �

Test selection •Data normally distributed –Parametric / Non!parametric •Relationship between two columns of data –Correlation (Pearson’s r / Spearman’s ρ ) •Comparing means between two columns of data –T!test / U!Test / Wilcoxon signed rank test •More than two columns –ANOVA / Kruskal!Wallis H test 15 Engineering Design Centre Scatter plot & Correlation Scatter Plot 20000 Predicted task completion time (in msec) 15000 10000 5000 0 0 5000 10000 15000 20000 Actual task completion time (in msec) 16 Engineering Design Centre �

Correlation , outliers 400 350 300 250 � �� 200 150 100 50 0 0 20 40 60 80 100 120 140 160 120 100 80 60 � �� 40 20 0 0 10 20 30 40 50 60 70 80 90 100 17 Engineering Design Centre Error plot Relative Error in Prediction 18 16 14 12 10 % Data 8 6 4 2 0 <!120 !120 !100 !80 !60 !40 !20 0 20 40 60 80 100 120 % Error 18 Engineering Design Centre �

Important terms •Degrees of freedom (df) •One tail and two tail tests – Better/Worse or just different •Type I ( α ) and Type II ( β ) error •Sphericity assumption (for ANOVA) 19 Engineering Design Centre Comparing means – t,test 20 Engineering Design Centre ��

The basic ANOVA situation Two variables: 1 Categorical, 1 Quantitative Main Question: Do the (means of) the quantitative variables depend on which group (given by categorical variable) the individual is in? If categorical variable has only 2 values: • 2!sample t!test ANOVA allows for 3 or more groups 21 Engineering Design Centre An example ANOVA situation Subjects: 25 patients with blisters Treatments: Treatment A, Treatment B, Placebo Measurement: # of days until blisters heal Data [and means]: • A: 5,6,6,7,7,8,9,10 [7.25] • B: 7,7,8,9,9,10,10,11 [8.875] • P: 7,9,9,10,10,10,11,12,13 [10.11] Are these differences significant? 22 Engineering Design Centre ��

Informal Investigation Graphical investigation: • side!by!side box plots • multiple histograms Whether the differences between the groups are significant depends on • the difference in the means • the standard deviations of each group • the sample sizes ANOVA determines P!value from the F statistic 23 Engineering Design Centre Side by Side Boxplots 13 12 11 10 days 9 8 7 6 5 A B P treatment 24 Engineering Design Centre ��

What does ANOVA do? At its simplest (there are extensions) ANOVA tests the following hypotheses: H 0 : The means of all the groups are equal. H a : Not all the means are equal • doesn’t say how or which ones differ. • Can follow up with “multiple comparisons” Note: we usually refer to the sub!populations as “groups” when doing ANOVA. 25 Engineering Design Centre Assumptions of ANOVA •each group is approximately normal � check this by looking at histograms and/or normal quantile plots, or use assumptions � can handle some nonnormality, but not severe outliers •standard deviations of each group are approximately equal � rule of thumb: ratio of largest to smallest sample st. dev. must be less than 2:1 26 Engineering Design Centre ��

Standard Deviation Check �� Compare largest and smallest standard deviations: • largest: 1.764 • smallest: 1.458 • 1.458 x 2 = 2.916 > 1.764 Note: variance ratio of 4:1 is equivalent. 27 Engineering Design Centre How ANOVA works (outline) ANOVA measures two sources of variation in the data and compares their relative sizes • variation BETWEEN groups • for each data value look at the difference between its group mean and the overall mean ( ) 2 x i − x • variation WITHIN groups • for each data value we look at the difference between that value and the mean of its group ( ) 2 x − x ij i 28 Engineering Design Centre ��

Result �� → The probability that the model is explaining variance by → The probability that the model is explaining variance by → The probability that the model is explaining variance by → The probability that the model is explaining variance by chance < 0.05 chance < 0.05 chance < 0.05 chance < 0.05 29 Engineering Design Centre F, statistics The ANOVA F!statistic is a ratio of the Between Group Variation divided by the Within Group Variation: Betw een MSG = = F Within MSE A large F is evidence against H 0 , since it indicates that there is more difference between groups than within groups. 30 Engineering Design Centre ��

Experiment Design and Statistical Data Analysis Dr. Pradipta Biswas - PDF document

Data and Analysis Part V Statistical Analysis of Data Alex Simpson Part V: Statistical Analysis

Data and Analysis Note 12 Statistical Analysis of Data I Alex Simpson Note 12 Statistical

Probabilistic Models of Human Sentence Experiment 1: Entropy and Sentence Length 2 Processing

Sodium Reactor Experiment Accident Sodium Reactor Experiment Accident Sodium Reactor Experiment

Future Outlook: Experiment Future Outlook: Experiment Future Outlook: Experiment Future Outlook:

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

STA 214: Probability & Statistical Models STA 214: Analysis of Statistical Models

Experiment Design and Data Analysis When dealing with measurement and simu- lation, a careful

Design & Analysis of Design & Analysis of Design & Analysis of Physical Design

PHYSICS PROSPECTS OF THE PHYSICS PROSPECTS OF THE JUNO EXPERIMENT JUNO EXPERIMENT Monica Sisti

Statistical graphics with Statistical graphics with ggplot2 ggplot2 Programming for Statistical

Importing Data from Statistical So ware haven Importing Data into R Statistical So

. Surajit Ray Minjung Kyung Jiezhun (Sherry) Gu Ray SAMSI, June 2 2005 - slide #1 Statistical

Statistical Simulation in Python Tushar Shanker Data Scientist DataCamp Statistical Simulation

Pool-based Agnostic Pool-based Agnostic Experiment Design Experiment Design in Linear

Statistical Data Analysis DS GA 1002 Statistical and Mathematical Models

CS 147: Computer Systems Performance Analysis One-Factor Experiments 1 / 42 Overview CS147

One-Population Tests One Population Mean Proportion t Test Z Test Z Test (1 & 2 (1

One-Way ANOVA modelling for RRAM reset curves alez 1 , Ana M. Aguilera 1 , Christian J. Acal

Statistics and learning Analysis of variance (ANOVA) Emmanuel Rachelson and Matthieu Vignes ISAE

Workshop 7.4a: Single factor ANOVA Murray Logan November 23, 2016 Table of contents 1 Revision

Time Series Analysis Henrik Madsen hm@imm.dtu.dk Informatics and Mathematical Modelling

From process to publication: understanding your census estimates June/July 2012 Welcome and

GLIMPSES: GLIMPSES: Memory and program behavior GLIMPSES: GLIMPSES: Memory and program behavior

Sambuz

Useful Links

Newsletter

Mail Us

Experiment Design and Statistical Data Analysis Dr. Pradipta Biswas - PDF document