THE REVISION OF SOME CONCEPTS Summary Statistics Quantitative data - PowerPoint PPT Presentation

THE REVISION OF SOME CONCEPTS…

Summary Statistics Quantitative data describes a numeric set of data by its Center, Variability, Shape But important to consider if data are: • Non-normal • Non-normal median range • Normal mean variance standard deviation

Data Summarization To summarize quantitative data, we need to use one or two parameters that can describe the data. 1. Measures of Central tendency which describes the center of the data 1. and the Measures of Dispersion, which show how the data are scattered around its center

Measures of central tendency Variable usually has a point (center) around which the observed values lie. These averages are also called measures of central tendency. The three most commonly used averages are: 1. The arithmetic mean: 2. The Median 3. The Mode

1- The arithmetic mean: the sum of observation divided by the number of observations: • x = ∑ x n Where : Where : x = mean ∑ denotes the (sum of) x the values of observation n the number of observation

2- Median It is the middle observation in a series of observation after arranging them in an ascending or descending manner. • The rank of median • The rank of median for is (n + 1)/2 if the for is (n + 1)/2 if the number of observation is odd • and n/2 if the number is even

3- Mode • The most frequent occurring value in the data is the mode and is calculated as follows: Example: 5, 6, 7, 5, 10. The mode in this data is 5 since number 5 is repeated twice. Sometimes, there is more than one mode and sometimes there is no mode especially in small set of observations. Unimodal - Bimodal - Nomodal

Advantages and disadvantages of Central Tendency Measures (CTM): • Mean: is the preferred CTM since it takes into account each individual observation but its main disadvantage is that it is affected by the extreme values of observations. • Median: it is a useful descriptive measure if there are one or two • Median: it is a useful descriptive measure if there are one or two extremely high or low values. The median is less sensitive to outliers (extreme scores) than the mean and thus a better measure than the mean for highly skewed distributions. • Mode : is rarely used.

Measures of Dispersion • The measure of dispersion describes the degree of variations or scatter or dispersion of the data around its central values: Range - R Range - R 1. 1. Variance - V 2. Standard Deviation – SD 3. dispersion = variation = spread = scatter

1- Range • is the difference between the largest and smallest values. • is the simplest measure of variation. • Disadvantages : it is based only on two of the observations and gives no idea of how the other observations are arranged between these two. Also, it tends to be large when the size of the sample increases

2- Variance If we want to get the average of differences between the mean and each observation in the data, we have to reduce each value from the mean and then sum these differences and divide it by the number of observation. divide it by the number of observation. V = ∑ (mean – x i ) / n

2- Variance • Variance: V = ∑ (mean – x) / n • The value of this equation will be equal to zero because the differences between each zero because the differences between each value and the mean will have negative and positive signs that will equalize zero on algebraic summation.

2- Variance • To overcome this zero we square the difference between the mean and each value so the sign will be always positive. be always positive. • Thus we get: V = ∑ (mean – x) 2 / n - 1

3- Standard Deviation (SD) The main disadvantage of the variance is that it is the square of the units used. So, it is more convenient to express the variation in the original units by taking the square root in the original units by taking the square root of the variance. This is called the standard deviation (SD). Therefore SD = √ V i.e. SD = √ ∑ (mean – x) 2 / n - 1 •

Summary Statistics and Normal data Summary statistics in useful to identify if data are normal or not Normal Data: approximately 95% of observations are between the mean plus or minus 2 standard deviations

Normal Distribution curve (NDC) NDC is a Graphical Presentation <Frequency Polygon> of any Quantitative Variables. The Normal Distribution Curve is the frequency polygon of a quantitative variable measured in large number. It occupies a major role in the techniques of statistical analysis.

Areas under the NDC • X ± 1 SD = 68% of the area on each side of the mean. • X ± 2 SD = 95% of area on each side of the • X ± 2 SD = 95% of area on each side of the mean. • X ± 3 SD = 99% of area on each side of the mean.

Characteristics of NDC 1- It is bell shaped, continuous curve. 2- It is symmetrical (i.e., can be divided into two equal halves vertically). 3- The tails never touch the base line but extended to 3- The tails never touch the base line but extended to infinity in either direction. 4- T he mean , median and mode values coincide. 5- I t is described by two parameters: arithmetic mean determine the location of the center of the curve and standard deviation represents the scatter around the mean.

NDC and Skewed data • If we represent a collected data by a frequency polygon graph and the resulted curve does not simulate the normal distribution curve (with all its normal distribution curve (with all its characteristics) then these data are not normally distributed

Skewness and Kurtosis Skewness: measures asymmetry of data – Positive or right skewed: Longer right tail – Negative or left skewed: Longer left tail Longer left tail Kurtosis: measures peakedness of the distribution of data. The kurtosis of normal distribution is 0.

NDC and normal measurement NDC can be used in distinguishing between normal from abnormal measurements. Example: If we have NDC for hemoglobin levels for a population of normal adult males with mean ± SD = 11 ± 1.5 If we obtain a hemoglobin reading for an individual = 8.1 and we want to know if he/she is normal or anemic. If this reading is within the area under the curve at 95% of normal (i.e. mean ± 2 SD) he /she will be considered normal. If his/her reading is less then he/she is anemic.

NDC and normal measurement The normal range for hemoglobin in this example will be: • the higher level of hemoglobin: 11 + 2 ( 1.5 ) =14. • the lower hemoglobin level: 11 – 2 ( 1.5 ) = 8. The normal range of hemoglobin of adult males is from 8 The normal range of hemoglobin of adult males is from 8 to 14. to 14. The reading of 8.1 is within the 95% of this population, therefore this individual is normal because this reading is within the 95% of this population.

How to test for Normality • Mean = Median • (mean-2sd, mean+2sd) reasonable range • -1 < skewness < 1 • -1 < kurtosis < 1 • Histogram shows symmetric bell shape • Histogram shows symmetric bell shape If data are not normal: • Natural log transformation can transform very skewed data to ‘Normal’ data  use transformed data in analysis

Use the tool at http://onlinestatbook.com/stat_sim/sampling_dist/index.html to check the characteristics of the sampling distribution of the mean.

disabled disabled disabled

disabled

THE REVISION OF SOME CONCEPTS Summary Statistics Quantitative data - PowerPoint PPT Presentation

THE REVISION OF SOME CONCEPTS Summary Statistics Quantitative data describes a numeric set of data by its Center, Variability, Shape But important to consider if data are: Non-normal Non-normal median range Normal mean

Week 12 Revision Discrete Math May 14, 2020 Marie Demlova: Discrete Math Revision Revision

Revision! How can we help? Revision Technique Didnt bother to revise ? How do you revise?

REVISION GUIDES: How to use them effectively Miss A Humphries and Mr C Dawson Science revision

REVISION GUIDES: How to use them effectively Miss A Humphries and Mrs K Leafe Science revision

EWBS Receiving Module Specifications 1.00 Century Revision history Revision history Revision

Accounts Revision https://nsa.org.na Overview Revision of national accounts Performance

Effective Revision Techniques Mrs Poole DLA RE Revision skills Much more than simply reading,

Revision Python - Nick Reynolds April 7, 2017 Revision (~15 mins) This Class Quiz

B2 Symmetry and Relativity Revision 1 TT 2020 Revision notes Highlights basic things

Using Revision Control In Vivado Tim Vanevenhoven Overview of revision control Recent

Revision Control with GIT Eric McCreath Revision Control Systems There are a large number of

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Revision of the Recommendations on Statistics of International Migration, rev. 1 (1998) Haoyi

Revision of Pharmaceutical Affairs Law (PAL) - Japan Update - Revision of Pharmaceutical

1 ReVision Energy presentation to SMMC Energy Team 3-13-2014 Sam LaValle of ReVision

EWBS Receiving Module Communication specifications v1.00 Century Revision history Revision

Primary Care First Foster Independence. Reward Outcomes. Model Briefing Center for Medicare

Rural Health Clinic Quality Measurement Project Rural Health Clinic Technical Assistance Call

PTO TRAINING JANUARY 8TH, 11:00 AM Call Instructions: Please Mute your phone, microphone,

The Continuing Controversy Over Screening for Gestational Diabetes I have nothing to disclose.

Producing slides with L A T EX2 Frank Mittelbach 2014/09/29 1 Introduction With L A T EX 2

Producing slides with L A T EX2 Frank Mittelbach 2016/03/29 1 Introduction With L A T EX 2

What youll learn today The difference between sample error and true error Confidence

CSCE 478/878 Lecture 5: Evaluating error D ( h ) Pr x D [ f ( x ) = h ( x )]

Sambuz

Useful Links

Newsletter

Mail Us