DEPARTMENT OF QUANTITATIVE METHODS & INFORMATION SYSTEMS
Introduction to Business Statistics Introduction to Business Statistics QM 120 Ch t 3 Chapter 3
- Dr. Mohammad Zainal
Introduction to Business Statistics Introduction to Business - - PowerPoint PPT Presentation
DEPARTMENT OF QUANTITATIVE METHODS & INFORMATION SYSTEMS Introduction to Business Statistics Introduction to Business Statistics QM 120 Ch Chapter 3 t 3 Spring 2008 Dr. Mohammad Zainal Measures of central tendency for ungrouped data
DEPARTMENT OF QUANTITATIVE METHODS & INFORMATION SYSTEMS
2
Graphs are very helpful to describe the basic shape of a data
2
One way to overcome graph problems is to use numerical One way to overcome graph problems is to use numerical
Numerical descriptive measures associated with a population
A measure of central tendency gives the center of a histogram A measure of central tendency gives the center of a histogram
The measures are: mean, median, and mode.
QM-120, M. Zainal
3
Data that give information on each member of the population
3
The mean (average) is the most frequently used measure of
The mean for ungrouped data is obtained by dividing the sum
QM-120, M. Zainal
4
4
2002 total payroll (millions of dollars) MLB team 62 Anaheim Angels 93 Atlanta Braves 126 New York Yankees 126 New York Yankees 75
34 Tampa Bay Devil Rays
The Mean is a balancing point
34 62 93 75 126
QM-120, M. Zainal
5
Sometimes a data set may contain a few very small or a few
5
We should be very cautious when using the mean It may not We should be very cautious when using the mean. It may not
Excluding California 5894 Washington Population (thousands) State
1212 Hawaii 627 Alaska 3421 Oregon An
2 . 9005 5 33,872 1212 627 3421 5894 Mean = + + + + =
Including California California 33,872
5
QM-120, M. Zainal
6
Sometimes we may assign weight (importance) to each
6
Sometimes we may assign weight (importance) to each
A mean computed in this manner is refereed to as a weighted
i i i
i
QM-120, M. Zainal
7
7
Purchase Price Quantity 1 300 5 000 1 .300 5,000 2 .325 15,000 3 .350 10,000 4 .295 20,000
QM-120, M. Zainal
8
The median is the value of the middle term in a data set has
8
set data ranked a in term 2 1 n the
Value Median
th
⎟ ⎠ ⎞ ⎜ ⎝ ⎛ + =
2 ⎠ ⎝
The value .5(n + 1) indicates the position in the ordered data
If n is even, we choose a value halfway between the two
QM-120, M. Zainal
9
The mode is the value that occurs with the highest frequency
9
A data with each value occurring only once has no mode. A data with each value occurring only once has no mode. A data set with only one value occurring with highest
A data set with two values that occurs with the same (highest)
If more than two values in a data set occur with the same
QM-120, M. Zainal
10
10
Symmetric histograms when
Right skewed histograms when
Left skewed histograms when
QM-120, M. Zainal
11
Data sets may have same center but look different because of
11
Measure of variability can help us to create a mental picture of
The range for ungrouped data
The range, like the mean, is highly influenced by outliers. The range is based on two values only.
QM-120, M. Zainal
12
The standard deviation is the most used measure of dispersion.
12
In general, larger values of standard deviation indicate that
2 2 2 2
2 2 2 2
Standard deviation is always non‐negative
2 2
Standard deviation is always non negative
QM-120, M. Zainal
13
13
2002 payroll (millions of dollars) MLB team 62 A h i A l 62 Anaheim Angels 93 Atlanta Braves 126 New York Yankees 75
34 Tampa Bay Devil Rays QM-120, M. Zainal
14
In some situations we may be interested in a descriptive
14
It is very useful when comparing two different samples with It is very useful when comparing two different samples with
It is given by: It is given by:
QM-120, M. Zainal
15
Once we group the data, we no longer know the values of
15
Thus, we find an approximation for the sum of these values. Thus, we find an approximation for the sum of these values.
QM-120, M. Zainal
16
16
2 2
2 2 2
2 2
2 2
QM-120, M. Zainal
17
Example: The table below gives the frequency distribution of
17
f Daily commuting time (min) 4 0 to less than 10 9 10 to less than 20 9 10 to less than 20 6 20 to less than 30 4 30 to less than 40 2 40 to less than 50
25 Total
QM-120, M. Zainal
18
We often are interested in the relative location of a data point xi
18
We often are interested in the relative location of a data point xi
The z‐scores (standardized value) can be used to find the The z scores (standardized value) can be used to find the
It is given by
i i
The z‐score can be interpreted as the number of standard
QM-120, M. Zainal
19
19
Number of students in a class 46 54 42 46 32
QM-120, M. Zainal
20
Chebyshevʹs theorem allows you to understand how the value
20
Chebyshevʹs theorem: The fraction of any data set lying within Chebyshev s theorem: The fraction of any data set lying within
2
This theorem applies to all data sets, which include a sample or
Chebyshev’s theorem is very conservative but it can be applied
QM-120, M. Zainal
21 21
QM-120, M. Zainal
22
22
QM-120, M. Zainal
23
The empirical rule gives more precise information about a data
23
Theorem:
1‐ 68% of the observations lie within one standard deviation of the mean. 2 95% of the observations lie within two standard deviations of the mean 2‐ 95% of the observations lie within two standard deviations of the mean. 3‐ 99.7% of the observations lie within three standard deviations of the mean.
QM-120, M. Zainal
24
24
QM-120, M. Zainal
25
Sometimes a data set may have one or more observation that is
25
Sometimes a data set may have one or more observation that is
This extreme value is called an outlier and cam be detected This extreme value is called an outlier and cam be detected
An experienced statistician may face the following situations
Outlier Action A data value that was incorrectly recorded Correct it before any further analysis A data value that was incorrectly recorded Correct it before any further analysis A data value that was incorrectly included Remove it before any further analysis A data value that belongs to the data set and Keep it ! correctly recorded
QM-120, M. Zainal
26
A measure of position which determines the rank of a single
26
A measure of position which determines the rank of a single
Quartiles are three measures that divide a ranked data set into Quartiles are three measures that divide a ranked data set into
QM-120, M. Zainal
27
Calculating the quartiles
The second quartile is the median of a data set.
27
The first quartile, Q1, is the value of the middle term among
The third quartile, Q3, is the value of the middle term among
Interquartile range (IQR) is the difference between the third Interquartile range (IQR) is the difference between the third
QM-120, M. Zainal
28
75.3 82.2 85.8 88.7 94.1 102.1 79.0 97.1 104.2 119.3 81.3 77.1
28
The values of the three quartiles The values of the three quartiles. The interquartile range. Where does the 104 2 fall in relation to these quartiles Where does the 104.2 fall in relation to these quartiles. QM-120, M. Zainal
29
Percentiles are the summery measures that divide a ranked
29
Percentiles are the summery measures that divide a ranked
The kth percentile is denoted by Pk where k is an integer in the The k
The approximate value of the kth percentile is
th k
QM-120, M. Zainal
30
75 3 82 2 85 8 88 7 94 1 102 1 79 0 97 1 104 2 119 3 81 3 77 1
30
75.3 82.2 85.8 88.7 94.1 102.1 79.0 97.1 104.2 119.3 81.3 77.1
QM-120, M. Zainal
31
Box and whisker plot gives a graphic presentation of data
31
Q1, Q2, Q3, smallest, and largest values*.
Can help to visualize the center, the spread, and the skewness Can help to visualize the center, the spread, and the skewness
Can help in detecting outliers.
Very good tool of comparing more than a distribution. Detecting an outlier: Detecting an outlier:
Lower fence: Q1 – 1.5(IQR) Upper fence: Q + 1 5(IQR) Upper fence: Q3 + 1.5(IQR) If a data point is larger than the upper fence or smaller than the
QM-120, M. Zainal
32
Draw a horizontal line representing the scale of the measurements. 32 Calculate the median, the upper and lower quartiles, and the IQR for the
Form a box just above the line with the right and left ends at Q1 and Q3. Draw a vertical line through the box at the location of the median. Mark any outliers with an asterisk (*) on the graph Mark any outliers with an asterisk ( ) on the graph. Extend horizontal lines called “Whiskers” from the ends of the box to the
Q1 Q2 Q3 Lower Upper
* *
Q1 Q2 Q3 fence pp fence
QM-120, M. Zainal
33
33
QM-120, M. Zainal
34
So far we have studied numerical methods to describe data
34
Often decision makers are interested in the relationship
To do so, we will use descriptive measure that is called
Covariance assigns a numerical value to the linear relationship
i i
xy
y i x i
xy
QM-120, M. Zainal
35
A big disadvantage of the covariance is that it depends on the
35
For the same data set, we will have two different covariance
Pearson’s correlation coefficient is a good remedy to that problem
It is given by
xy
xy xy y x xy
y x y
QM-120, M. Zainal
36
36
Average Driving Distance (meters) Average 18‐Hole Score 277.6 259.5 269 1 69 71 70 269.1 267.0 255.6 70 70 71 272.9 69
QM-120, M. Zainal
37
x y
37
277.6 259 5 69 71 259.5 269.1 267.0 255 6 71 70 70 71 255.6 272.9 71 69
QM-120, M. Zainal
38
38
Week Number of commercials (x) Sales in $ (y) 1 2 50 2 5 57 2 5 57 3 1 41 4 3 54 5 4 54 6 1 38 7 5 63 8 3 48 9 4 59 10 2 46 10 2 46
QM-120, M. Zainal