Descriptive Methods 707.031: Evaluation Methodology Winter 2015/16 - PowerPoint PPT Presentation

Descriptive Methods 707.031: Evaluation Methodology Winter 2015/16 Eduardo Veas

what we do with the data depends on the scales 2

Measurement Scales 3

The complexity of measurements • Nominal Crude • Ordinal • Interval • Ratio Sophisticated 4

Nominal data • arbitrarily assigning a code to a category or attribute: postal codes, job classifications, military ranks, gender • mathematical manipulations are meaningless • mutually exclusive categories • each category is a level • use: freq, counts, 5

Ordinal data • ranking of an attribute • interval between points in scale not intrinsically equal • comparisons < or > are possible 6

Interval data • equal distances between adjacent values, but no absolute zero • temperature in C or F • mean can be computed • Likert scale data ? 7

Ratio • absolute zero • can be operated mathematically • time to complete, distance or velocity of cursor, • count, normalized count (count per something) 8

Frequencies Title Text 9

Frequency tables • tab.courses<- as.data.frame(freq(ordered(courses)), plot=FALSE) • CumFreq= cumsum(tab.courses[- dim(tab.courses)[1],]$Frequency) • tab.courses$CumFreq=c(CumFreq,NA) • tab.courses 10

Interpreting frequency tables Frequency Percent CumPercent CumFreq 1 2 20 20 2 2 3 30 50 5 3 4 40 90 9 4 1 10 100 10 Total 10 100 NA NA 11

Contingency Tables Right-handed Left-handed Total Males 43 9 52 Females 44 4 48 Totals 87 13 100 sd 12

Modelling 13

Statistical models • A model has to accurately represent the real world phenomenon. • A model can be used to predict things about the real world. • The degree to which a statistical model represents the data collected is called fit of the model 14

Frequency distributions • plot observations on the x-axis and a bar showing the count per observation • ideally observations fall symmetrically around the center • skew and kurtosis describe abnormalities in the distributions 15

Histogram / Frequency distributions 16

Center of a distribution • Mode: score that occurs most frequently in the dataset • it may take several values • it may change dramatically with a single added score • Median: is the middle score (after ranking all scores) • for even nr of scores, add centric values and divide by 2 • good for ordinal, interval and ratios • Mean: average score • can be influenced by extreme scores 17

Dispersion of a distribution • range: difference between lowest and highest score 252 - 22 = 232 121 - 22 = 99 • interquartile difference: mode + upper and lower quartiles 18

Fit of the mean • deviance: mean - x • sum of squared errors (SS) • variance = SS / N-1 • stddev = sqrt(variance) 19

Assumptions 20

Assumptions of parametric data • normally distributed: sample or error in the model • homogeneity of variance: • correlational: variance of one variable should be stable at all levels of the other variable • groups: each sample comes from a population with same variance • interval data: at least interval data • independence: the behaviour of one participant does not influence that of another 21

Distributions for DLF 1.2 0.75 0.6 0.9 0.50 Density Density 0.4 Density 0.6 0.25 0.3 0.2 0.0 0.00 0.0 0 1 2 3 0 1 2 3 0 1 2 3 4 Hygiene score on day 3 Hygiene score on day 2 Hygiene score on day1 3 3 3 2 2 2 sample sample sample 1 1 1 22 0 0 0 -2 0 2 -3 -2 -1 0 1 2 3 -2 -1 0 1 2 theoretical theoretical theoretical

Quantify normallity 23

Different groups 24

Exam histogram 0.025 0.020 0.015 density 0.010 0.005 0.000 25 50 75 100 exam 25

Exam histogram 0.04 0.025 0.03 density 0.02 0.020 0.01 0.015 0.00 density 10 20 30 40 50 60 70 exam 0.010 0.06 0.005 0.04 density 0.000 0.02 25 50 75 100 exam 0.00 26 60 70 80 90 100 exam

Shapiro-Wilk test • # Shapiro-Wilk • shapiro.test(rexam$exam) • • #if we are comparing groups, what is important is the normallity within each group • by(rexam$exam, rexam$uni, shapiro.test) 27

Reporting Shapiro-Wilk • A Shapiro-Wilk test on the R exam, W=0.96, proved a significant deviation from normality (p<0.05). 28

Homogeneity of variance • Levene’s test: • leventTest(rexam$exam, rexam$uni, center=mean) • Reporting: for the percentage on the R exam, the variances were similar for KFU and TUG students, F(1,98)=2.09 29

Homogeneity of variance • Levene in large datasets may give sig for small variations • Double check Variance ratio (Hartley’s Fmax) 30

Correlations Title Text 31

Everything is hard to begin with, but the more you practise the easier it gets 32

Relationships • Everything is hard to begin with, but the more you practise the easier it gets • increase in practice, increase in skill • increase in practice, but skill remains unchanged • increase in practice, decrease in skill 33

Correlations • Bivariate: correlation between two variables • Partial: correlation between two variables while controlling the effect of one or more additional variables 34

Covariance • are changes in one variable met with similar changes in the other variable • cross product deviations= multiply deviations of the two variables • covariance= CPD / (N-1) 35

Covariance II • Positive: both variables vary in the same direction • Negative: variables vary in opposite directions • Covariance is scale dependent and cannot be generalized 36

Pearson correlation coefficient • cov/s x s y • Data must be at least interval • Value between -1 and 1 • 1 -> variables positively correlated • 0 -> no linear relationship • -1 -> variables negatively correlated 37

Dataset Exams and Anxiety • effects of exam stress and revision on exam performance • questionnaire to assess anxiety relating to exams (EAQ) 38

Enter data • examData<-read.delim("ExamAnxiety.dat", header=TRUE) • examData2<- examData[,c(“Exam”,"Anxiety","Revise")] • cor(examData2) 39

Pearson correlation • Exam Anxiety Revise • Exam 1.0000000 -0.4409934 0.3967207 • Anxiety -0.4409934 1.0000000 -0.7092493 • Revise 0.3967207 -0.7092493 1.0000000 40

Confidence values • rcorr(as.matrix(examData[,c(“Exam","Anxiety","R evise")])) • Exam Anxiety Revise • Exam 0 0 • Anxiety 0 0 • Revise 0 0 41

Reporting Pearson’s CC A Pearson correlation coefficient indicated a significant correlation between anxiety performance and time spent revising, r=-.44, p<0.01 42

Spearman’s correlation coefficient • non parametric test • first rank the data and then apply Pearson cc 43

Liar Dataset • contest for storytelling the biggest lie • 68 participants, ranking, and creativity questionnaire 44

Spearman test • liarData=read.delim("biggestLiar.dat", header=TRUE) • rcorr(as.matrix(liarData[,c(“Position","Creativity") ])) • Position Creativity • Position 1.00 -0.31 • Creativity -0.31 1.00 45

Reporting spearman A Spearman non-parametric correlation test indicated a significant correlation between creativity and ranking in the world’s biggest liar contest, r=-.37, p<0.001 46

Kendall’s tau non-parametric • used for small datasets • cor.test(liarData$Position, liarData$Creativity, alternative="less", method="kendall") • z = -3.2252, p-value = 0.0006294 • alternative hypothesis: true tau is less than 0 • sample estimates: • tau • -0.3002413 47

Reporting Kendall’s test A Kendall tau correlation coefficient indicated a correlation between creativity and performance in the World’s biggest liar contest, t=-.30, p<0.001 48

Biserial and point-biserial correlations • one variable is dichotomous (categorical with 2 categories) • point biserial: for discrete dichotomy (e.g., dead) • biserial: for continuous dichotomy (e.g., pass exam) 49

Readings • Discovering statistics using R (Andy Field, Jeremy Miles, Zoe Field) 50

R Title Text 51

set work directory • setwd("/new/work/directory") • getwd() • ls() # list the objects in the current workspace 52

packages • install.packages(“package.name") #installing packages • library(package.name) # loading a package • package::function() # disambiguating functions 53

Nominal and Ordinal data • mydata$v1 <- factor(mydata$v1,   levels = c(1,2,3),   labels = c("red", "blue", “green")) • mydata$v1 <- ordered(mydata$y,   levels = c(1,3, 5),   labels = c("Low", "Medium", "High")) 54

Descriptive Methods 707.031: Evaluation Methodology Winter 2015/16 - PowerPoint PPT Presentation

Descriptive Methods 707.031: Evaluation Methodology Winter 2015/16 Eduardo Veas what we do with the data depends on the scales 2 Measurement Scales 3 The complexity of measurements Nominal Crude Ordinal Interval Ratio

48-175 Descriptive Geometry Basic Concepts of Descriptive Geometry Descriptive geometry is

Descriptive Statistics Descriptive and Inferential Statistics Recall that statistical methods are

Descriptive Epidem iology & Descriptive Epidem iology & Study design Study design

Descriptive Complexity of Jonni Virtema Deterministic Polylogarithmic Time Descriptive

Descriptive statistics P RACTICIN G S TATIS TICS IN TERVIEW QUES TION S IN R Zuzanna

I t Introduction to d t i t Descriptive Descriptive Statistics Statistics 17.871 Spring

Trademark and Unfair Competition Law Slides 22: Descriptive and Nominative Fair Use LAWS 7341-001

Descriptive combinatorics and ergodic theorems Anush Tserunyan University of Illinois at

Agenda for today 1. Descriptive Data Analysis 2. Graphics XploRe Descriptive Data Analysis 1-2

Games in Descriptive Set Theory, or: its all fun and games until someone loses the axiom of

48-175 Descriptive Geometry Lines in Descriptive Geometry recap-depicting lines 2 taking

48-175 Descriptive Geometry Planes in Descriptive Geometry A spatial figure is a plane

Statistical Methods Statistical Methods Descriptive Inferential Statistics Statistics

Introduction Descriptive (AKA Survey) Research is a quantitative methodology Very similar to

Descriptive Statistics Stephen E. Brock, Ph.D., NCSP California State University, Sacramento 1

Randomness via effective descriptive set theory Andr Nies The University of Auckland FRG

If I Only Knew Then What I Know Now 5 Lessons Learned from my experiences leading agile

Organization of DSLE part Tooling Domain Specific Language Domain Specific Language

The Quest-V Separation Kernel Richard West richwest@cs.bu.edu Computer Science Goals

Clinical Assessment of Patients with Acute Coronary Syndrome Managed with Percutaneous Coronary

Web Application Stress Testing with Blaise Internet with Blaise Internet Jim OReilly Jim O

Iteratively Improving Spark Application Performance William C. Benton Red Hat, Inc. Forecast

RRDtool Tips & Tricks Tobias Oetiker <tobi@oetiker.ch> OETIKER+PARTNER AG Tobi Oetiker

Picture of Tvlle in wet season Frogs love this Rick loves disease investigation, so when the Bohle

Descriptive Methods 707.031: Evaluation Methodology Winter 2015/16 - PowerPoint PPT Presentation

Descriptive Methods 707.031: Evaluation Methodology Winter 2015/16 Eduardo Veas what we do with the data depends on the scales 2 Measurement Scales 3 The complexity of measurements Nominal Crude Ordinal Interval Ratio

48-175 Descriptive Geometry Basic Concepts of Descriptive Geometry Descriptive geometry is

Descriptive Statistics Descriptive and Inferential Statistics Recall that statistical methods are

Descriptive Epidem iology &amp; Descriptive Epidem iology &amp; Study design Study design

Descriptive Complexity of Jonni Virtema Deterministic Polylogarithmic Time Descriptive

Descriptive statistics P RACTICIN G S TATIS TICS IN TERVIEW QUES TION S IN R Zuzanna

I t Introduction to d t i t Descriptive Descriptive Statistics Statistics 17.871 Spring

Trademark and Unfair Competition Law Slides 22: Descriptive and Nominative Fair Use LAWS 7341-001

Descriptive combinatorics and ergodic theorems Anush Tserunyan University of Illinois at

Agenda for today 1. Descriptive Data Analysis 2. Graphics XploRe Descriptive Data Analysis 1-2

Games in Descriptive Set Theory, or: its all fun and games until someone loses the axiom of

48-175 Descriptive Geometry Lines in Descriptive Geometry recap-depicting lines 2 taking

48-175 Descriptive Geometry Planes in Descriptive Geometry A spatial figure is a plane

Statistical Methods Statistical Methods Descriptive Inferential Statistics Statistics

Introduction Descriptive (AKA Survey) Research is a quantitative methodology Very similar to

Descriptive Statistics Stephen E. Brock, Ph.D., NCSP California State University, Sacramento 1

Randomness via effective descriptive set theory Andr Nies The University of Auckland FRG

If I Only Knew Then What I Know Now 5 Lessons Learned from my experiences leading agile

Organization of DSLE part Tooling Domain Specific Language Domain Specific Language

The Quest-V Separation Kernel Richard West richwest@cs.bu.edu Computer Science Goals

Clinical Assessment of Patients with Acute Coronary Syndrome Managed with Percutaneous Coronary

Web Application Stress Testing with Blaise Internet with Blaise Internet Jim OReilly Jim O

Iteratively Improving Spark Application Performance William C. Benton Red Hat, Inc. Forecast

RRDtool Tips &amp; Tricks Tobias Oetiker &lt;tobi@oetiker.ch&gt; OETIKER+PARTNER AG Tobi Oetiker

Picture of Tvlle in wet season Frogs love this Rick loves disease investigation, so when the Bohle

Descriptive Epidem iology & Descriptive Epidem iology & Study design Study design

RRDtool Tips & Tricks Tobias Oetiker <tobi@oetiker.ch> OETIKER+PARTNER AG Tobi Oetiker