Chapter 2 Methods for Describing Sets of Data Objectives Describe - - PowerPoint PPT Presentation

chapter 2
SMART_READER_LITE
LIVE PREVIEW

Chapter 2 Methods for Describing Sets of Data Objectives Describe - - PowerPoint PPT Presentation

Chapter 2 Methods for Describing Sets of Data Objectives Describe Data using Graphs Describe Data using Charts Describing Qualitative Data Qualitative data are nonnumeric in nature Best described by using Classes 2 descriptive


slide-1
SLIDE 1

Chapter 2

Methods for Describing Sets of Data

slide-2
SLIDE 2

Objectives

Describe Data using Graphs Describe Data using Charts

slide-3
SLIDE 3

Describing Qualitative Data

  • Qualitative data are nonnumeric in nature
  • Best described by using Classes
  • 2 descriptive measures

class frequency – number of data points in a class class relative = class frequency frequency total number of data points in data set class percentage – class relative frequency x 100

slide-4
SLIDE 4

Describing Qualitative Data – Displaying Descriptive Measures

Summary Table

Class Frequency Class percentage – class relative frequency x 100

slide-5
SLIDE 5

Describing Qualitative Data – Qualitative Data Displays

Bar Graph

slide-6
SLIDE 6

Describing Qualitative Data – Qualitative Data Displays

Pie chart

slide-7
SLIDE 7

Describing Qualitative Data – Qualitative Data Displays

Pareto Diagram

slide-8
SLIDE 8

Graphical Methods for Describing Quantitative Data

The Data

Company Percentage Company Percentage Company Percentage Company Percentage 1 13.5 14 9.5 27 8.2 39 6.5 2 8.4 15 8.1 28 6.9 40 7.5 3 10.5 16 13.5 29 7.2 41 7.1 4 9.0 17 9.9 30 8.2 42 13.2 5 9.2 18 6.9 31 9.6 43 7.7 6 9.7 19 7.5 32 7.2 44 5.9 7 6.6 20 11.1 33 8.8 45 5.2 8 10.6 21 8.2 34 11.3 46 5.6 9 10.1 22 8.0 35 8.5 47 11.7 10 7.1 23 7.7 36 9.4 48 6.0 11 8.0 24 7.4 37 10.5 49 7.8 12 7.9 25 6.5 38 6.9 50 6.5 13 6.8 26 9.5

Percentage of Revenues Spent on Research and Development

slide-9
SLIDE 9

Graphical Methods for Describing Quantitative Data

Dot Plot

slide-10
SLIDE 10

Graphical Methods for Describing Quantitative Data

Stem-and-Leaf Display

slide-11
SLIDE 11

Graphical Methods for Describing Quantitative Data

Histogram

slide-12
SLIDE 12

Graphical Methods for Describing Quantitative Data

More on Histograms

Number of Observations in Data Set Number of Classes

Less than 25 5-6 25-50 7-14 More than 50 15-20

slide-13
SLIDE 13

Summation Notation

Used to simplify summation instructions Each observation in a data set is identified by a subscript x1, x2, x3, x4, x5, …. xn Notation used to sum the above numbers together is

n n i i

x x x x x x

     

4 3 2 1

1

slide-14
SLIDE 14

Summation Notation

Data set of 1, 2, 3, 4 Are these the same? and

 4 1

2

i i

x

2 4 1

      

 i i

x

30 16 9 4 1

2 4 2 3 2 2 2 1 2

4 1

    

   

x x x x x

i i

 

100 10 4 3 2 1

2 2 2 2 4 1

4 3 2 1

           

         

  

x x x x x

i i

slide-15
SLIDE 15

Numerical Measures of Central Tendency

  • Central Tendency – tendency of data to

center about certain numerical values

  • 3 commonly used measures of Central

Tendency Mean Median Mode

slide-16
SLIDE 16

Numerical Measures of Central Tendency

The Mean

  • Arithmetic average of the elements of the

data set

  • Sample mean denoted by
  • Population mean denoted by
  • Calculated as

and

 x

n x x

n i i

1

n x

n i i

1

slide-17
SLIDE 17

Numerical Measures of Central Tendency

The Median

  • Middle number when observations are

arranged in order

  • Median denoted by m
  • Identified as the
  • bservation if n is
  • dd, and the mean of the

and

  • bservations if n is even

5 . 2  n 2 n 1 2  n

slide-18
SLIDE 18

Numerical Measures of Central Tendency

The Mode

  • The most frequently occurring value in the

data set

  • Data set can be multi-modal – have more

than one mode

  • Data displayed in a histogram will have a

modal class – the class with the largest frequency

slide-19
SLIDE 19

Numerical Measures of Central Tendency

The Data set 1 3 5 6 8 8 9 11 12 Mean Median is the

  • r 5th observation, 8

Mode is 8

7 9 63 9 12 11 9 8 8 6 5 3 1

1

           

n x x

n i i

5 . 2  n

slide-20
SLIDE 20

Numerical Measures of Variability

  • Variability – the spread of the data across

possible values

  • 3 commonly used measures of Central

Tendency Range Variance Standard Deviation

slide-21
SLIDE 21

Numerical Measures of Variability

The Range

  • Largest measurement minus the smallest

measurement

  • Loses sensitivity when data sets are large

These 2 distributions have the same range. How much does the range tell you about the data variability?

slide-22
SLIDE 22

Numerical Measures of Variability

The Sample Variance (s2)

  • The sum of the squared deviations from the

mean divided by (n-1). Expressed as units squared

  • Why square the deviations? The sum of the

deviations from the mean is zero

1 ) (

1 2 2

  

n x x s

n i i

slide-23
SLIDE 23

Numerical Measures of Variability

The Sample Standard Deviation (s)

  • The positive square root of the sample

variance

  • Expressed in the original units of

measurement

2 1 2

1 ) ( s n x x s

n i i

   

slide-24
SLIDE 24

Numerical Measures of Variability

Samples and Populations - Notation

Sample Population Variance s2 Standard Deviation s

2

slide-25
SLIDE 25

Interpreting the Standard Deviation

How many observations fit within + n s of the mean?

Chebyshev’s Rule Empirical Rule

  • r

No useful info Approximately 68%

  • r

At least 75% Approximately 95%

  • r

At least 8/9 Approximately 99.7%

 2  s 2 

 3  s 3 

 1  s 1 

slide-26
SLIDE 26

Interpreting the Standard Deviation

You have purchased compact fluorescent light bulbs for your home. Average life length is 500 hours, standard deviation is 24, and frequency distribution for the life length is mound shaped. One of your bulbs burns out at 450 hours. Would you send the bulb back for a refund? Interval Range % of observations included % of observations excluded 476 - 524 Approximately 68% Approximately 32% 452 - 548 Approximately 95% Approximately 5% 428 - 572 Approximately 99.7% Approximately 0.3%

s 1  s 2 

s 3 

slide-27
SLIDE 27

Numerical Measures of Relative Standing

Descriptive measures of relationship of a measurement to the rest of the data Common measures:

  • percentile ranking or percentile score
  • z-score
slide-28
SLIDE 28

Numerical Measures of Relative Standing

Percentile rankings make use of the pth percentile The median is an example of percentiles. Median is the 50th percentile – 50 % of

  • bservations lie above it, and 50% lie below

it For any p, the pth percentile has p% of the measures lying below it, and (100-p)% above it

slide-29
SLIDE 29

Numerical Measures of Relative Standing

z-score – the distance between a measurement x and the mean, expressed in standard units Use of standard units allows comparison across data sets

    x z s x x z  

slide-30
SLIDE 30

Numerical Measures of Relative Standing

More on z-scores Z-scores follow the empirical rule for mounded distributions

slide-31
SLIDE 31

Methods for Detecting Outliers

Outlier – an observation that is unusually large or small relative to the data values being described Causes

  • Invalid measurement
  • Misclassified measurement
  • A rare (chance) event

2 detection methods

  • Box Plots
  • z-scores
slide-32
SLIDE 32

Methods for Detecting Outliers

Box Plots

  • based on quartiles, values that divide

the dataset into 4 groups

  • Lower Quartile QL – 25th percentile
  • Middle Quartile - median
  • Upper Quartile QU – 75th percentile
  • Interquartile Range (IQR) = QU - QL
slide-33
SLIDE 33

Methods for Detecting Outliers

Box Plots Not on plot – inner and outer fences, which determine potential outliers

QU (hinge) QL (hinge) Median Potential Outlier Whiskers

slide-34
SLIDE 34

Methods for Detecting Outliers

Rules of thumb

  • Box Plots

–measurements between inner and outer fences are suspect –measurements beyond outer fences are highly suspect

  • Z-scores

–Scores of 3 in mounded distributions (2 in highly skewed distributions) are considered

  • utliers
slide-35
SLIDE 35

Graphing Bivariate Relationships

Bivariate relationship – the relationship between two quantitative variables Graphically represented with the scattergram

slide-36
SLIDE 36

The Time Series Plot

Time Series Data – data produced and monitored

  • ver time

Graphically represented with the time series plot

Time on x axis Order on x axis

slide-37
SLIDE 37

Distorting the Truth with Descriptive Techniques

  • Graphical techniques

–Scale manipulation

Same data, different scales

slide-38
SLIDE 38

Distorting the Truth with Descriptive Techniques

  • Graphical techniques

–More Scale manipulation

slide-39
SLIDE 39

Distorting the Truth with Descriptive Techniques

  • Graphical techniques

–More Scale manipulation

slide-40
SLIDE 40

Distorting the Truth with Descriptive Techniques

  • Numerical techniques

–Mismatch of measure of central tendency and distribution shape

Use of mean overstates average Use of mean understates average

slide-41
SLIDE 41

Distorting the Truth with Descriptive Techniques

  • Numerical techniques

–Discussion of central tendency with no information on variability

Which model would you purchase if you knew only the average MPG? Would knowing the standard deviation affect your choice? Why?

slide-42
SLIDE 42

Distorting the Truth with Descriptive Techniques

  • Graphical techniques

–Look past the pictures to the data they represent

  • Numerical techniques

–Is measure being used most appropriate for underlying distribution? –Are you provided with information on central tendency and variability?

slide-43
SLIDE 43

Summary

Graphical methods for Qualitative Data

–Pie chart –Bar graph –Pareto diagram

  • Graphical methods for Quantitative Data

–Dot plot –Stem-and-leaf display –Histogram

slide-44
SLIDE 44

Summary

Numerical measures of central tendency

–Mean –Median –Mode

  • Numerical measures of variation

–Range –Variance –Standard Deviation

slide-45
SLIDE 45

Summary

Distribution Rules

–Chebyshev’s Rule –Empirical Rule

  • Measures of relative standing

–Percentile scores –z-scores

  • Methods for detecting Outliers

–Box plots –z-scores

slide-46
SLIDE 46

Summary

Method for graphing the relationship between two quantitative variables

–Scatterplot