THE REVISION OF SOME CONCEPTS Summary Statistics Quantitative data - - PowerPoint PPT Presentation

the revision of some concepts summary statistics
SMART_READER_LITE
LIVE PREVIEW

THE REVISION OF SOME CONCEPTS Summary Statistics Quantitative data - - PowerPoint PPT Presentation

THE REVISION OF SOME CONCEPTS Summary Statistics Quantitative data describes a numeric set of data by its Center, Variability, Shape But important to consider if data are: Non-normal Non-normal median range Normal mean


slide-1
SLIDE 1

THE REVISION OF SOME CONCEPTS…

slide-2
SLIDE 2

Summary Statistics

Quantitative data describes a numeric set

  • f data by its Center, Variability, Shape

But important to consider if data are:

  • Non-normal
  • Non-normal

median range

  • Normal

mean variance standard deviation

slide-3
SLIDE 3

Data Summarization

To summarize quantitative data, we need to use

  • ne or two parameters that can describe the

data.

  • 1. Measures
  • f

Central tendency which describes the center of the data

  • 1. and the Measures of Dispersion, which show

how the data are scattered around its center

slide-4
SLIDE 4

Measures of central tendency

Variable usually has a point (center) around which the

  • bserved values lie.

These averages are also called measures of central tendency. The three most commonly used averages are:

  • 1. The arithmetic mean:
  • 2. The Median
  • 3. The Mode
slide-5
SLIDE 5

1- The arithmetic mean:

the sum of observation divided by the number of

  • bservations:
  • x =

∑ x n Where : Where : x = mean ∑ denotes the (sum of) x the values of observation n the number of observation

slide-6
SLIDE 6

2- Median

It is the middle observation in a series of

  • bservation

after arranging them in an ascending or descending manner.

  • The rank of median

for is (n + 1)/2 if the

  • The rank of median

for is (n + 1)/2 if the number of observation is odd

  • and n/2 if the number is even
slide-7
SLIDE 7
  • The most frequent occurring value in the data is the

mode and is calculated as follows: Example: 5, 6, 7, 5, 10. The mode in this data is 5 since number 5 is repeated twice.

3- Mode

Sometimes, there is more than one mode and sometimes there is no mode especially in small set of observations. Unimodal - Bimodal - Nomodal

slide-8
SLIDE 8

Advantages and disadvantages of

Central Tendency Measures (CTM):

  • Mean: is the preferred CTM since it takes into account each

individual observation but its main disadvantage is that it is affected by the extreme values of observations.

  • Median: it is a useful descriptive measure if there are one or two
  • Median: it is a useful descriptive measure if there are one or two

extremely high or low values. The median is less sensitive to outliers (extreme scores) than the mean and thus a better measure than the mean for highly skewed distributions.

  • Mode: is rarely used.
slide-9
SLIDE 9

Measures of Dispersion

  • The measure of dispersion describes the degree of

variations or scatter or dispersion of the data around its central values: 1. Range - R 1. Range - R 2. Variance -V 3. Standard Deviation – SD

dispersion = variation = spread = scatter

slide-10
SLIDE 10

1- Range

  • is the difference between the largest and smallest

values.

  • is the simplest measure of variation.
  • Disadvantages: it is based only on two of the
  • bservations and gives no idea of how the other
  • bservations are arranged between these two. Also,

it tends to be large when the size of the sample

increases

slide-11
SLIDE 11

If we want to get the average of differences between the mean and each observation in the data, we have to reduce each value from the mean and then sum these differences and divide it by the number of observation.

2- Variance

divide it by the number of observation. V = ∑ (mean – xi) / n

slide-12
SLIDE 12
  • Variance: V = ∑ (mean – x) / n
  • The value of this equation will be equal to

zero because the differences between each

2- Variance

zero because the differences between each value and the mean will have negative and positive signs that will equalize zero on algebraic summation.

slide-13
SLIDE 13
  • To overcome this zero we square the difference

between the mean and each value so the sign will be always positive.

2- Variance

be always positive.

  • Thus we get:

V = ∑ (mean – x)2 / n - 1

slide-14
SLIDE 14

3- Standard Deviation (SD)

The main disadvantage of the variance is that it is the square of the units used. So, it is more convenient to express the variation in the original units by taking the square root in the original units by taking the square root

  • f the variance.

This is called the standard deviation (SD). Therefore SD = √ V

  • i.e. SD = √ ∑ (mean – x)2 / n - 1
slide-15
SLIDE 15

Summary statistics in useful to identify if data are normal or not

Summary Statistics and Normal data

Normal Data: approximately 95% of

  • bservations are between the mean plus
  • r minus 2 standard deviations
slide-16
SLIDE 16

Normal Distribution curve (NDC)

NDC is a Graphical Presentation <Frequency Polygon> of any Quantitative Variables. The Normal Distribution Curve is the frequency polygon of a quantitative variable measured in large number. It occupies a major role in the techniques of statistical analysis.

slide-17
SLIDE 17

Areas under the NDC

  • X ± 1 SD = 68% of the area on each side of the

mean.

  • X ± 2 SD = 95% of area on each side of the
  • X ± 2 SD = 95% of area on each side of the

mean.

  • X ± 3 SD = 99% of area on each side of the

mean.

slide-18
SLIDE 18

Characteristics of NDC

1- It is bell shaped, continuous curve. 2- It is symmetrical (i.e., can be divided into two equal halves vertically). 3- The tails never touch the base line but extended to 3- The tails never touch the base line but extended to infinity in either direction. 4- The mean, median and mode values coincide. 5- It is described by two parameters: arithmetic mean determine the location of the center of the curve and standard deviation represents the scatter around the mean.

slide-19
SLIDE 19

NDC and Skewed data

  • If we represent a collected data by a

frequency polygon graph and the resulted curve does not simulate the normal distribution curve (with all its normal distribution curve (with all its characteristics) then these data are not normally distributed

slide-20
SLIDE 20

Skewness and Kurtosis

Skewness: measures asymmetry

  • f data

– Positive or right skewed: Longer right tail – Negative or left skewed: Longer left tail Longer left tail Kurtosis: measures peakedness of the distribution of data. The kurtosis of normal distribution is 0.

slide-21
SLIDE 21

NDC can be used in distinguishing between normal from abnormal measurements. Example: If we have NDC for hemoglobin levels for a population of normal adult males with mean ± SD = 11 ± 1.5

NDC and normal measurement

If we obtain a hemoglobin reading for an individual = 8.1 and we want to know if he/she is normal or anemic. If this reading is within the area under the curve at 95% of normal (i.e. mean ± 2 SD) he /she will be considered normal. If his/her reading is less then he/she is anemic.

slide-22
SLIDE 22

The normal range for hemoglobin in this example will be:

  • the higher level of hemoglobin: 11 + 2 ( 1.5 ) =14.
  • the lower hemoglobin level: 11 – 2 ( 1.5 ) = 8.

The normal range of hemoglobin of adult males is from 8 to 14.

NDC and normal measurement

The normal range of hemoglobin of adult males is from 8 to 14. The reading of 8.1 is within the 95% of this population, therefore this individual is normal because this reading is within the 95% of this population.

slide-23
SLIDE 23

How to test for Normality

  • Mean = Median
  • (mean-2sd, mean+2sd) reasonable range
  • -1 < skewness < 1
  • -1 < kurtosis < 1
  • Histogram shows symmetric bell shape
  • Histogram shows symmetric bell shape

If data are not normal:

  • Natural log transformation can transform very skewed

data to ‘Normal’ data  use transformed data in analysis

slide-24
SLIDE 24
slide-25
SLIDE 25
slide-26
SLIDE 26
slide-27
SLIDE 27

Use the tool at http://onlinestatbook.com/stat_sim/sampling_dist/index.html to check the characteristics of the sampling distribution of the mean.

slide-28
SLIDE 28
slide-29
SLIDE 29

disabled

disabled disabled

slide-30
SLIDE 30
slide-31
SLIDE 31
slide-32
SLIDE 32
slide-33
SLIDE 33

disabled