Introduction to data display Useful questions to ask when - - PowerPoint PPT Presentation

introduction to data display useful questions to ask when
SMART_READER_LITE
LIVE PREVIEW

Introduction to data display Useful questions to ask when - - PowerPoint PPT Presentation

Introduction to data display Useful questions to ask when considering how to display information What do you want to show? What methods are available for this? Is the method chosen the best? Would another have been better?


slide-1
SLIDE 1

Introduction to data display

slide-2
SLIDE 2
slide-3
SLIDE 3

Useful questions to ask when considering how to display information

  • What do you want to show?
  • What methods are available for this?
  • Is the method chosen the best? Would

another have been better?

slide-4
SLIDE 4

Recommendations for the presentation of numbers

  • When summarizing categorical data, both

frequencies and percentages can be used. However, if percentages are reported, it is important that the denominator (i.e. total number of observations) is given.

  • To summarize continuous numerical data, one

should use the mean and standard deviation,

  • r if the data have a skewed distribution use

the median and range or interquartile range.

slide-5
SLIDE 5

Recommendations when presenting data and results in tables

  • Tables, including column and row headings,

should be clearly labeled and a brief summary

  • f the contents of a table should always be

given in words, either as part of the title or in the main body of the text.

  • The amount of information should be

maximized for the minimum amount of ink.

slide-6
SLIDE 6
slide-7
SLIDE 7

Recommendations for construction of graphs

  • The amount of information should be maximized

for the minimum amount of ink.

  • Each graph should have a title explaining what is

being displayed.

  • Axes should be clearly labeled.
  • Gridlines should be kept to a minimum.
  • Avoid three-dimensional graphs as these can be

difficult to read.

  • The number of observations should be included.
slide-8
SLIDE 8
slide-9
SLIDE 9

Table or graph?

slide-10
SLIDE 10

Examples for badly displayed data

slide-11
SLIDE 11
slide-12
SLIDE 12
slide-13
SLIDE 13
slide-14
SLIDE 14
slide-15
SLIDE 15

Describing categorical data

slide-16
SLIDE 16
slide-17
SLIDE 17
slide-18
SLIDE 18

Clustered data

slide-19
SLIDE 19
slide-20
SLIDE 20
slide-21
SLIDE 21

Displaying quantitative data

slide-22
SLIDE 22

Tables for multiple outcome measures

slide-23
SLIDE 23

Stem and leaf plot

slide-24
SLIDE 24

Histogram

slide-25
SLIDE 25

Showing distribution

slide-26
SLIDE 26

Skewed data

slide-27
SLIDE 27

Box–whisker plots

slide-28
SLIDE 28

Displaying the relationship between two continuous variables

slide-29
SLIDE 29
slide-30
SLIDE 30
slide-31
SLIDE 31

Regression

slide-32
SLIDE 32
slide-33
SLIDE 33

ROC curve

slide-34
SLIDE 34

Tabulating categorical outcomes

slide-35
SLIDE 35

Tabulating the results of logistic regression analysis

slide-36
SLIDE 36

Tabulating quantitative outcomes

slide-37
SLIDE 37

Tabulating the results of regression analyses

slide-38
SLIDE 38

Patient flow diagram

slide-39
SLIDE 39

Forest plots

slide-40
SLIDE 40

Funnel plots

slide-41
SLIDE 41

Survival

slide-42
SLIDE 42
slide-43
SLIDE 43
slide-44
SLIDE 44

Displaying results in presentations

  • Keep slides simple.
  • Text is meant to be read. Ensure that your slides are legible.
  • For slides use light text on a dark background.
  • Keep information layout, colors, patterns, text styles, and

transitions and build effects consistent for all slides in a presentation.

  • Maximum of six lines per slide and six words per line.
  • Use graphics and animation effects sparingly.
  • San serif fonts such as Arial are the more legible for slides.
  • Use a minimum font size of 28 points for titles and 18

points for the body of text

slide-45
SLIDE 45
slide-46
SLIDE 46

Thanks for your attention

slide-47
SLIDE 47

Statistical Methods

slide-48
SLIDE 48 48

Descriptive vs. Inferential

  • Descriptive statistics summarize your group.

– average age 78.5, 89.3% white.

  • Inferential statistics use the theory of probability to

make inferences about larger populations from your sample. – White patients were significantly older than black and Hispanic patients, P<0.001.

slide-49
SLIDE 49 49

Enter your data with statistical analysis in mind.

  • For small projects enter data into Microsoft

Excel or directly into SPSS.

  • For large projects, create a database with

Microsoft Access.

  • Keep variables names in the first row, with <=8

characters, and no internal spaces.

  • Enter as little text as possible and use codes

for categories, such as 1=male, 2=female.

slide-50
SLIDE 50 50

Screen your data thoroughly for errors and inconsistencies before doing ANY analyses.

  • Check the lowest and highest value for each variable.

– For example, age 1-777.

  • Look at histograms to detect typos.
  • Cross-check variables to detect impossible

combinations. – For example, pregnant males, survivors discharged to the morgue, patients in the ICU for 25 days with no complications.

slide-51
SLIDE 51 51

Analyze, descriptive statistics, frequencies, select the variable

Statistics AGE 933 79.292 81.300 90.0 26.537 763.0 14.0 777.0 Valid Missing N Mean Median Mode
  • Std. Dev iation
Range Minimum Maximum AGE 775.0 725.0 675.0 625.0 575.0 525.0 475.0 425.0 375.0 325.0 275.0 225.0 175.0 125.0 75.0 25.0

AGE

700 600 500 400 300 200 100
  • Std. Dev = 26.54
Mean = 79.3 N = 933.00
slide-52
SLIDE 52 52

Analyze, Descriptive Statistics, Crosstabs

SURVIVA L * 48-DISPOSITION Crosstabulation Count 63 63 224 56 12 201 236 3 138 870 224 56 12 63 201 236 3 138 933 EXPIRED SURVIVED SURVIVAL Total HOME REHABILI TATION FACILITY OTHER HOSPITAL MORGUE SKILLED NURSING FACILITY HOME WITH ASSISTA NCE AMA DISCHAR GE AGAINST MEDICAL ADVICE 8 48-DISPOSITION Total
slide-53
SLIDE 53 53

Correct the data in the original database or spreadsheet and import a revised version into the statistical package.

  • The age of 777 should be checked and

changed to the correct age.

  • Suspicious values, such as an age of 106

should be checked. In this case it is correct.

slide-54
SLIDE 54 54

Run descriptive statistics to summarize your data.

SURVIVA L 63 6.8 6.8 6.8 870 93.2 93.2 100.0 933 100.0 100.0 EXPIRED SURVIVED Total Valid Frequency Percent Valid Percent Cumulativ e Percent Statistics 49-DAYS IN HOSPIT AL 933 23.34 19.00 20 18.03 236 1 237 Valid Missing N Mean Median Mode
  • Std. Dev iation
Range Minimum Maximum 49-DA YS IN HOSPITAL 240.0 220.0 200.0 180.0 160.0 140.0 120.0 100.0 80.0 60.0 40.0 20.0 0.0

49-DAYS IN HOSPITAL

400 300 200 100
  • Std. Dev = 18.03
Mean = 23.3 N = 933.00
slide-55
SLIDE 55 55

P Value

  • A P value is an estimate of the probability of results

such as yours could have occurred by chance alone if there truly was no difference or association.

  • P < 0.05 = 5% chance, 1 in 20.
  • P <0.01 = 1% chance, 1 in 100.
  • Alpha is the threshold. If P is < this threshold, you

consider it statistically significant.

slide-56
SLIDE 56 56

Univariate vs. Multivariate

  • Univariate analysis usually refers to one

predictor variable and one outcome variable

– Is gender a predictor of pneumonia?

  • Multivariate analysis usually refers to more

than one predictor variable or more than one

  • utcome variable being evaluated

simultaneously.

– After adjusting for age, is gender a predictor of pneumonia?

slide-57
SLIDE 57 57

Difference vs. Association

  • Some tests are designed to assess whether there are

statistically significant differences between groups. – Is there a statistically significant difference between the age of patients with and without pneumonia?

  • Some tests are designed to assess whether there are

statistically significant associations between variables. – Is the age of the patient associated with the number of days in the hospital?

slide-58
SLIDE 58 58

Unmatched vs. Matched

  • Some statistical tests are designed to assess

groups that are unmatched or independent.

– Is the admission systolic blood pressure different between men and women?

  • Some statistical tests are designed to assess

groups that are matched or data that are paired.

– Is the systolic blood pressure different between admission and discharge?

slide-59
SLIDE 59 59

Level of Measurement

  • Categorical vs. continuous variables

– If you take the average of a continuous variable, it has meaning.

  • Average age, blood pressure, days in the hospital.

– If you take the average of a categorical variable, it has no meaning.

  • Average gender, race, smoker.
slide-60
SLIDE 60 60

Level of Measurement

  • Nominal - categorical

– gender, race, hypertensive

  • Ordinal - categories that can be ranked

– none, light, moderate, heavy smoker

  • Interval - continuous

– blood pressure, age, days in the hospital

slide-61
SLIDE 61 61

Examples of Normal and Skewed

44-DAYS IN ICU 70.0 65.0 60.0 55.0 50.0 45.0 40.0 35.0 30.0 25.0 20.0 15.0 10.0 5.0 0.0

44-DAYS IN ICU

1000 800 600 400 200
  • Std. Dev = 3.99
Mean = .9 N = 933.00 35-SYSTOLIC BLOOD PRESSURE FIRST ER 2 5 . 2 4 . 2 3 . 2 2 . 2 1 . 2 . 1 9 . 1 8 . 1 7 . 1 6 . 1 5 . 1 4 . 1 3 . 1 2 . 1 1 . 1 . 9 . 8 . 7 . 6 .

35-SYSTOLIC BLOOD PRESSURE FIRST ER

160 140 120 100 80 60 40 20
  • Std. Dev = 27.74
Mean = 146.9 N = 925.00
slide-62
SLIDE 62 62

Commonly used statistical methods

  • 1.

Chi-square

  • 2.

Logistic regression

  • 3.

Student's t-test

  • 4.

Fisher's exact test

  • 5.

Kaplan-Meier method

  • 7.

Wilcoxon rank-sum test

  • 8.

Log-rank test

  • 9.

Linear regression analysis

slide-63
SLIDE 63 63

Commonly used statistical methods

  • 10.

One-way analysis of variance (ANOVA)

  • 12. Mann-Whitney U test
  • 13. Kruskal-Wallis test
  • 14. Repeated-measures analysis of

variance

  • 15. Paired t-test
  • 16. Wilcoxon signed-rank test
slide-64
SLIDE 64 64

Chi-square

  • The most commonly used statistical test.
  • Used to test if two or more percentages are

different.

  • For example, suppose that in a study of 933

patients with a hip fracture, 10% of the men (22/219) of the men develop pneumonia compared with 5% of the women (36/714).

  • What is the probability that this could happen

by chance alone?

  • Univariate, difference, unmatched, nominal,

=>2 groups, n=>20.

slide-65
SLIDE 65 65

Chi-square example

4 8 7 8 5 2 2 3 6 5 8 9 4 3 C % C % C % A P P C 4 T A A E

  • t

u a r 7 b 1 7 4 1 2 2 1 1 1 8 9 1 7 9 3 3 P C

a

L F L N a l d f m p s i d c t s i d c t s i d C a . b

slide-66
SLIDE 66 66

Fisher’s Exact Test

  • This test can be used for 2 by 2 tables when

the number of cases is too small to satisfy the assumptions of the chi-square.

– Total number of cases is <20 or – The expected number of cases in any cell is <1 or – More than 25% of the cells have expected frequencies <5.

slide-67
SLIDE 67 67

u a r 5 b 1 4 1 3 2 1 9 1 1 9 3 3 P C

a

L F L N a l d f m p s i d c t s i d c t s i d C a . 1 b

6 . 9 9 t a b u 5 5 . 5 . 5 . % % % % % % 5 5 3 5 8 . 5 . 5 . % % % % % % 5 8 3 . . . % % % % % % C E % C 4 % C C E % C 4 % C C E % C 4 % C A P P C 4 T S E S O L

  • t
slide-68
SLIDE 68 68

Student’s t-test

  • Used to compare the average (mean) in one

group with the average in another group.

  • Is the average age of patients significantly

different between those who developed pneumonia and those who did not?

  • Univariate, Difference, Unmatched, Interval,

Normal, 2 groups.

slide-69
SLIDE 69 69

n t S 7 4 1 9 3 1 9 9 5 9 2 5 4 1 9 6 2 5 E E A F S i g e s t a r i t d f S i g ta i e ffe r . E ffe r

  • w

p p fi d e D i ff u a l

slide-70
SLIDE 70 70

Mann-Whitney U test

  • Same as the Wilcoxon rank-sum test
  • Used in place of the Student’s

t-test when the data are skewed.

  • A nonparametric test that uses

the rank of the value rather than the actual value.

  • Univariate, Difference,

Unmatched, Interval, Nonnormal, 2 groups.

slide-71
SLIDE 71 71

Paired t-test

  • Used to compare the average for

measurements made twice within the same person - before vs. after.

  • Used to compare a treatment group and a

matched control group.

  • For example, Did the systolic blood pressure

change significantly from the scene of the injury to admission?

  • Univariate, Difference, Matched, Interval,

Normal, 2 groups.

slide-72
SLIDE 72 72

Wilcoxon signed-rank test

  • Used to compare two skewed continuous variables

that are paired or matched.

  • Nonparametric equivalent of the paired t-test.
  • For example, “Was the Glasgow Coma Scale score

different between the scene and admission?”

  • Univariate, Difference, Matched, Interval,

Nonnormal, 2 group.

slide-73
SLIDE 73 73

ANOVA

One-way used to compare more than 3 means from independent groups. “Is the age different between White, Black, Hispanic patients?” Two-way used to compare 2 or more means by 2

  • r more factors.

“Is the age different between Males and Females, With and Without Pnuemonia?”

slide-74
SLIDE 74 74 Tests of Between-Subjects Effects Dependent Variable: AGE 5769 944a 4 1442 486 8664 .775 .000 1981 .683 1 1981 .683 11.904 .001 1299 .320 1 1299 .320 7.8 05 .005 519.282 1 519.282 3.1 19 .078 1546 57.2 929 166.477 5924 601 933 Source Model SEX PNEUMON SEX * PNEUMON Error Total Ty pe III Sum of Squares df Mean Square F Sig. R Squa red = .974 (Adjusted R Sq uared = .974) a.
slide-75
SLIDE 75 75

Kruskal-Wallis One-Way ANOVA

  • Used to compare continuous variables that

are not normally distributed between more than 2 groups.

  • Nonparametric equivalent to the one-way

ANOVA.

  • Is the length of stay different by ethnicity?
  • Analyze, nonparametric tests, K independent

samples.

slide-76
SLIDE 76 76

Repeated-Measures ANOVA

  • Used to assess the change in 2 or more continuous

measurement made on the same person. Can also compare groups and adjust for covariates.

  • Do changes in the vital signs within the first 24 hours
  • f a hip fracture predict which patients will develop

pneumonia?

  • Analyze, General Linear Model, Repeated Measures.
slide-77
SLIDE 77 77

Pearson Correlation

  • Used to assess the linear association between

two continuous variables.

– r=1.0 perfect correlation – r=0.0 no correlation – r=-1.0 perfect inverse correlation

  • Univariate, Association, Interval
slide-78
SLIDE 78 78 Correlations 1.0 00 .088** .211** .137** .149**
  • .030
  • .008
. .007 .000 .000 .000 .356 .809 933 933 933 933 925 926 923 .088** 1.0 00 .167** .453** .039 .016 .022 .007 . .000 .000 .237 .633 .499 933 933 933 933 925 926 923 .211** .167** 1.0 00 .222** .034
  • .079*
.055 .000 .000 . .000 .296 .017 .093 933 933 933 933 925 926 923 .137** .453** .222** 1.0 00
  • .033
  • .028
.046 .000 .000 .000 . .310 .393 .161 933 933 933 933 925 926 923 .149** .039 .034
  • .033
1.0 00 .043 .069* .000 .237 .296 .310 . .196 .035 925 925 925 925 925 925 923
  • .030
.016
  • .079*
  • .028
.043 1.0 00
  • .100**
.356 .633 .017 .393 .196 . .002 926 926 926 926 925 926 923
  • .008
.022 .055 .046 .069*
  • .100**
1.0 00 .809 .499 .093 .161 .035 .002 . 923 923 923 923 923 923 923 Pearson Correlation
  • Sig. (2-ta
il e d) N Pearson Correlation
  • Sig. (2-ta
il e d) N Pearson Correlation
  • Sig. (2-ta
il e d) N Pearson Correlation
  • Sig. (2-ta
il e d) N Pearson Correlation
  • Sig. (2-ta
il e d) N Pearson Correlation
  • Sig. (2-ta
il e d) N Pearson Correlation
  • Sig. (2-ta
il e d) N AGE 49-DAYS IN HOSPITAL NUMBER OF COMORBIDITES (0-9 ) 43-TOT AL NUMBER OF COMPLICATIONS 35-SYSTOLIC BLOOD PRESSURE FIRST ER 35-GLASGOW COMA SCALE FIRST ER 35-PULSE FIRST ER AGE 49-DAYS IN HOSPITAL NUMBER OF COMORB IDITES (0-9 ) 43-TOT AL NUMBER OF COMPLIC ATIONS 35-SYSTO LIC BLOOD PRESSU RE FIRST ER 35-GLASG OW COMA SCALE FIRST ER 35-PULSE FIRST ER Correlation is signif ican t at the 0.0 1 lev el (2-ta il e d). **. Correlation is signif ican t at the 0.0 5 lev el (2-ta il e d). *.
slide-79
SLIDE 79 79

Spearman rank-order correlation

  • Use to assess the relationship between two
  • rdinal variables or two skewed continuous

variables.

  • Nonparametric equivalent of the Pearson

correlation.

  • Univariate, Association, Ordinal (or skewed).
slide-80
SLIDE 80 80 Correlations 1.0 00 .089** .158** .145** .091**
  • .146**
  • .008
. .007 .000 .000 .005 .000 .806 933 933 933 933 925 926 923 .089** 1.0 00 .142** .389** .073* .048 .037 .007 . .000 .000 .027 .149 .268 933 933 933 933 925 926 923 .158** .142** 1.0 00 .229** .037
  • .091**
.042 .000 .000 . .000 .257 .006 .202 933 933 933 933 925 926 923 .145** .389** .229** 1.0 00
  • .014
  • .076*
.043 .000 .000 .000 . .676 .020 .196 933 933 933 933 925 926 923 .091** .073* .037
  • .014
1.0 00 .079* .080* .005 .027 .257 .676 . .017 .015 925 925 925 925 925 925 923
  • .146**
.048
  • .091**
  • .076*
.079* 1.0 00
  • .038
.000 .149 .006 .020 .017 . .252 926 926 926 926 925 926 923
  • .008
.037 .042 .043 .080*
  • .038
1.0 00 .806 .268 .202 .196 .015 .252 . 923 923 923 923 923 923 923 Correlation Coef f icient
  • Sig. (2-ta
il e d) N Correlation Coef f icient
  • Sig. (2-ta
il e d) N Correlation Coef f icient
  • Sig. (2-ta
il e d) N Correlation Coef f icient
  • Sig. (2-ta
il e d) N Correlation Coef f icient
  • Sig. (2-ta
il e d) N Correlation Coef f icient
  • Sig. (2-ta
il e d) N Correlation Coef f icient
  • Sig. (2-ta
il e d) N AGE 49-DAYS IN HOSPITAL NUMBER OF COMORBIDITES (0-9 ) 43-TOT AL NUMBER OF COMPLICATIONS 35-SYSTOLIC BLOOD PRESSURE FIRST ER 35-GLASGOW COMA SCALE FIRST ER 35-PULSE FIRST ER Spearman's rho AGE 49-DAYS IN HOSPITAL NUMBER OF COMORB IDITES (0-9 ) 43-TOT AL NUMBER OF COMPLIC ATIONS 35-SYSTO LIC BLOOD PRESSU RE FIRST ER 35-GLASG OW COMA SCALE FIRST ER 35-PULSE FIRST ER Correlation is signif ican t at the .01 lev el ( 2-ta il e d). **. Correlation is signif ican t at the .05 lev el ( 2-ta il e d). *.
slide-81
SLIDE 81 81

Summary of Inferential Tests

slide-82
SLIDE 82 82

Unpaired vs. Paired

  • Student’s t-test
  • Chi-square
  • One-way ANOVA
  • Mann-Whitney U test
  • Kruskal-Wallis H test
  • Paired t-test
  • McNemar’s test
  • Repeated-measures
  • Wilcoxon signed-rank
  • Friedman ANOVA
slide-83
SLIDE 83 83

Parametric vs. Nonparametric

  • Student’s t-test
  • One-way ANOVA
  • Paired t-test
  • Pearson correlation
  • Correlated F ratio

(repeatedmeasures ANOVA)

  • Mann-Whitney U test
  • Kruskal-Wallis test
  • Wilcoxon signed-rank
  • Spearman’s r
  • Friedman ANOVA
slide-84
SLIDE 84 84

A Good Rule to Follow

  • Always check your results with a

nonparametric.

  • If you test your null hypothesis with a

Student’s t-test, also check it with a Mann- Whitney U test.

  • It will only take an extra 25 seconds.
slide-85
SLIDE 85 85

Linear Regression

  • Used to assess how one or more predictor

variables can be used to predict a continuous

  • utcome variable.
  • “Do age, number of comorbidities, or

admission vital signs predict the length of stay in the hospital after a hip fracture?”

  • Multivariate, Association, Interval/Ordinal

dependent variable.

slide-86
SLIDE 86 86 Coefficients a
  • 4.4
51 18.889
  • .236
.814 7.1 36E-02 .045 .053 1.5 71 .117 2.6 06 .548 .159 4.7 57 .000 1.5 62E-02 .022 .024 .726 .468 1.0 67 1.1 70 .030 .912 .362 2.5 81E-02 .047 .019 .554 .580
  • 8.0
0E-02 .188
  • .014
  • .425
.671 (Constant) AGE NUMBER OF COMORBIDITES (0-9 ) 35-SYSTOLIC BLOOD PRESSURE FIRST ER 35-GLASGOW COMA SCALE FIRST ER 35-PULSE FIRST ER 35-RESPIRATION RATE FIRST ER Model 1 B
  • Std. Error
Unstandardized Coeff icients Beta Standardi zed Coeff icien ts t Sig. Depe ndent Variable: 49-DAYS IN HOSPIT AL a.
slide-87
SLIDE 87 87

Logistic Regression

  • Used to assess the predictive value of one or more

variables on an outcome that is a yes/no question.

  • “Do age, gender, and comorbidities predict which hip

fracture patients will develop pneumonia?”

  • Multivariate, Difference, Nominal dependent

variable, not time-dependent, 2 groups.

slide-88
SLIDE 88 88

1 Total number of comorbidities 2 Cirrhosis 3 COPD 4 Gender 5 Age

slide-89
SLIDE 89 89

Draw Conclusions

  • We reject the null hypothesis.
  • Patients who are at high risk of developing

pneumonia during their hospitalization for a hip fracture can be identified by:

– total number of pre-existing conditions – cirrhosis – COPD – male gender

slide-90
SLIDE 90 90

Survival Analysis

  • Kaplan-Meier method

– Used to plot cumulative survival

  • Log-rank test

– Used to compare survival curves

  • Cox proportional-hazards

– Used to adjust for covariates in survival analysis

slide-91
SLIDE 91

Thanks for your attention

slide-92
SLIDE 92

Introduction to Statistics

Descriptive Analysis

slide-93
SLIDE 93

Review of Descriptive Stats.

  • Descriptive Statistics are used to present

quantitative descriptions in a manageable form.

  • This method works by reducing lots of data

into a simpler summary.

slide-94
SLIDE 94

Univariate Analysis

  • This is the examination across cases of one

variable at a time.

  • Frequency distributions are used to group

data.

  • One may set up margins that allow us to

group cases into categories.

  • Examples include:

– age categories – price categories – temperature categories.

slide-95
SLIDE 95

Distributions

Two ways to describe a univariate distribution

  • a table
  • a graph (histogram, bar chart)
slide-96
SLIDE 96

Distributions (con’t)

Ditribution of participants of the research methodology workshop by sex

0% 10% 20% 30% 40% 50% 60% 70% Men Women

Sex No % Men 12 60 Women 8 40 total 20 100

slide-97
SLIDE 97

Distributions (con’t)

Workshop participants by specialty

Microbiology Env ironmental sciences Fishery Nursing Other Workshop participants by specialty 0% 5% 10 % 15 % 20 % 25 % 30 % 35 % 40 % Microbiology Environmental sciences Fishery Nursing Others
slide-98
SLIDE 98

Distributions (cont.)

Category Percent Under 35 9 36-45 21 46-55 45 56-65 19 66+ 6

A Frequency Distribution Table

slide-99
SLIDE 99

Distributions (cont.)

10 20 30 40 50 Under 35 36-45 46-55 56-65 66+ Percent

A Histogram

slide-100
SLIDE 100

Central Tendency

  • An estimate of the “center” of a distribution
  • Three different types of estimates:

– Mean – Median – Mode

slide-101
SLIDE 101

Mean

  • The most commonly used method of describing

central tendency.

  • One basically totals all the results and then divides

by the number of units or “n” of the sample.

  • Example: The pretest mean was determined by the

sum of all the scores divided by the number of students taking the exam.

slide-102
SLIDE 102

Working Example (mean)

  • Lets take the set of scores:

11,10,8,9,12,11,6,13

  • The Mean would be 80/8=10
slide-103
SLIDE 103

Median

  • The median is the score found at the exact

middle of the set.

  • One must list all scores in numerical order,

and then locate the score in the center of the sample.

  • Example: if there are 500 scores in the list,

score #250 would be the median.

  • This is useful in weeding out outliers.
slide-104
SLIDE 104

Working Example (median)

  • Lets take the set of scores:

11,10,8,9,12,11,6,13

  • First line up the scores.

6, 8, 9, 10, 11, 11, 12, 13

  • The middle score falls at 10.5. There are 8

scores and score #4 and #5 represent the halfway point.

slide-105
SLIDE 105

Mode

  • The mode is the most repeated score in the

set of results.

  • Lets take the set of scores:

11,10,8,9,12,11,6,13

  • Again we first line up the scores

6, 8, 9, 10, 11, 11, 12, 13 #11 is the most repeated score and is therefore labeled the mode.

slide-106
SLIDE 106

Dispersion

  • Three estimates:

– Range – Mean Absolute Deviation – Standard Deviation

  • Standard Deviation is more accurate/detailed,

because an outlier can greatly extend the range

slide-107
SLIDE 107

Range

  • The range is used to identify the highest and

lowest scores.

  • Lets take the set of scores:

6, 8, 9, 10, 11, 11, 12, 13

  • The range would be 6-13. This identifies the

fact that 7 points separates the highest to the lowest score.

slide-108
SLIDE 108

Standard Deviation

  • The Standard Deviation is a value that shows

the relation that individual scores have to the mean of the sample.

  • If scores are said to be standardized to a

normal curve then there are several statistical manipulations that can be performed to analyze the data set.

slide-109
SLIDE 109

Standard Dev. (con’t)

  • Assumptions may be made about the percentage of

scores as they deviate from the mean.

  • If scores are normally distributed, then one can

assume that approximately 68% of the scores in the sample fall within one standard deviation of the

  • mean. Approximately 95% of the scores would then

fall within two standard deviations of the mean.

slide-110
SLIDE 110

Working Example (stand. dev.)

  • Lets take the set of scores:

11,10,8,9,12,11,6,13

  • The mean of this sample was found to be 10.
  • Again we use the scores

11,10,8,9,12,11,6,13.

  • 11-10=1, 10-10=0, 8-10=-2, 9-10=-1,12-10=2,

11-10=1, 6-10=-4,13-10=3

slide-111
SLIDE 111

Working Ex. (Stan. dev. con’t)

  • Square these values.

1, 0, 4, 1, 4, 1, 16, 9

  • Total these values 36.
  • Divide 36 by 7: 5.15
  • Take the square root of 5.15: 2.27
  • 2.27 is your Standard Deviation.
slide-112
SLIDE 112

Interquartile range

  • The median is the same as the 50th percentile.
  • The 25th and 75th percentiles are called the lower and

upper quartiles.

slide-113
SLIDE 113

Interquartile range

slide-114
SLIDE 114

Definition: A set of n measurements on the variable x has been arranged in order of magnitude.

  • The lower quartile (first quartile), Q1, is the value of x that

exceeds one-fourth of the measurements and is less than the remaining 3/4.

  • The second quartile is the median.
  • The upper quartile (third quartile), Q3, is the value of x

that exceeds three-fourths of the measurements and is less than one-fourth.

slide-115
SLIDE 115

Thanks