Beyond descriptive statistics 2 When we have a data set, we usually - PowerPoint PPT Presentation

Beyond descriptive statistics 2  When we have a data set, we usually want to do more with the data than just describe them  Keep in mind that data are information of a sample selected or generated from a population, and our goal is to make inferences about the population

Statistical inference 3 Statistical inference can be further subdivided into the two main areas of estimation and hypothesis • Estimation is concerned with estimating the values of specific population parameters (Today’s lecture). • Hypothesis testing is concerned with testing whether the value of a population parameter is equal to some specific value (next lecture).

Point estimation and interval estimation 4 • Sometimes we are interested in obtaining specific values as estimates of our parameters (along with estimation precise). These values are referred to as point estimates • Sometimes we want to specify a range within which the parameter values are likely to fall. If the range is narrow, then we may feel our point estimate is good. These are called interval estimates

From Sample to Population! 5 • Purpose of Population? inference: Make decisions about population characteristics when it is impractical to observe the whole population and we only have a sample of data drawn from the population

Towards statistical inference 6 o Parameter : a number describing the population o Statistic: a number describing a sample o Statistical inference: Statistic → Parameter

Estimation of population mean 7 ( , x x ,..., x ) • We have a sample randomly sampled from 1 2 n a population • The population mean µ and variance σ 2 are unknown • Question: how to use the observed sample ( , x x ,..., x ) 1 2 n to estimate µ and σ 2 ?

Point estimator of population mean and variance 8 • A natural estimator for estimating population mean µ is the sample mean n ∑ = i / x x n = 1 i • A natural estimator for estimating population standard deviation σ is the sample standard deviation n 1 ∑ = − 2 ( ) s x x − i n 1 = i 1

Point estimator of population mean 9 • A natural estimator for estimating population mean µ is the sample mean n ∑ = i / x x n = 1 i • Question: How good is this estimate?

Point estimator of population mean 10 • A natural estimator for estimating population mean µ is the sample mean n ∑ = i / x x n = 1 i • Question: How good is this estimate? x • We would like this is close to µ, however, we don’t know the value of µ. • We need to study the distribution of X

Sampling distribution of sample mean 11 • To understand what properties of make it a X desirable estimator for µ, we need to forget about our particular sample for the moment and consider all possible samples of size n that could have been selected from the population • The values of in different samples will be different. X  , , , x x x These values will be denoted by 1 2 3 X • The sampling distribution of is the distribution x of values over all possible samples of size n that could have been selected from the study population

Research question: center of a population 12 Population Mean

Research question: center of a population 13 Random sample 1  Random sample 2  Population . Mean . . Random sample K  Sample is representative of the population

Research question: center of a population 14 X Random sample 1  x 1 Random sample 2  x Population 2 . Mean . Random sample K  x K  The selection of random sample set is A RANDOM EXPERIMENT.  is RANDOM VARIABLE X  are observed values for 1 ,..., x x X K

An example of sampling distribution 15

Sample mean is an unbiased estimator of population mean  , , , x x x 1 2 3 16 • We can show that the average of these samples mean ( over all possible samples) is equal to the population mean µ • Unbiasedness: Let X 1 , X 2 , …, X n be a random sample drawn from some population with mean µ. Then = µ ( X ) E

is minimum variance unbiased estimator of µ X 17 • The unbiasedness of sample mean is not sufficient reason to use it as an estimator of µ • There are many other unbiased estimators, like sample median and the average of min and max • We can show that (but not here): among all kinds of unbiased estimators, the sample mean has the smallest variance • Now what is the variance of sample mean ? X

Standard error(SE) of sample mean X • The variance of sample mean measures the 18 estimation precise. σ • is the population variance 2 = σ 2 var( X ) / n = σ ( ) / SE X n

σ Use to estimate s / n / n σ 19 2 • In practice, the population variance is rarely 2 s known. And the sample variance is a σ 2 reasonable estimator for σ • Therefore, the standard error of mean can / n be estimated by s / n n 1 ∑ = − 2 s ( x x ) (recall that ) − i 1 n = 1 i NOTE : The larger sample size is  the smaller standard error is  the more accurate estimation is X

An example of standard error 20 • A sample of size 10 birthweights: 97, 125, 62, 120, 132, 135, 118, 137, 126, 118 (sample mean x-bar=117.00 and sample standard deviation s=22.44) • In order to estimate the population mean µ, a = point estimate is the sample mean , x 117 . 00 with standard error given by = = = / 22 . 44 / 10 7 . 09 SE s n

Summary of sampling distribution of X 21 • Let X 1 , …, X n be a random sample from a population with µ and σ 2 . Then the mean and variance of is X µ and σ 2 /n, respectively • Furthermore, if X 1 , …, X n be a random sample from a normal population with µ and σ 2 . Then by the properties of linear combination, is also normally X distributed, that is 2 n µ σ X ~ N ( , / ) • Now the question is, if the population is NOT normal, what is the distribution of ? X

The Central Limit Theorem 22 • Let X 1 , X 2 , …, X n denote n independent random variables sampled from some population with mean µ and variance σ 2 • When n is large, the sampling distribution of the sample mean is approximately normally distributed even if the underlying population is not normal ≈ µ σ 2 ( , ) X N n • By standardization: − µ X ≈ Z ~ N (0,1) σ / n

Illustration of Central limit Theorem (CLT) 23

Interval estimation 24 • Let X 1 , X 2 , …, X n denote n independent random variables sampled from some population with mean µ and variance σ 2 • Our goal is to estimate µ. We know that is a good point estimate N1=N2=15" • Now we want to have a confidence interval − + = ± ( X a , X a ) X a such that a 95% confidence interval(CI) for µ should satisfy the following: − < µ < + = Pr( ) 95 % X a X a

Interval estimation 25 2 n µ σ • BY CLT we have . Use the Z-transformation, X ~ N ( , / ) we have − µ X = Z ~ N (0,1) σ / n • Thus for a standard normal variable, what’s the width of the interval within which 95% of the data are covered? Pr(-1<Z<1)=0.6827, Pr(-1.96<Z<1.96)=0.95 Pr(-2.58<Z<2.58)=0.99

Interval estimation 26 = − ≤ ≤ • Thus we have 0.95 ( 1.96 1.96) P Z − µ X − ≤ ≤ = ( 1.96 P 1.96) σ / n However, we don’t know σ , we replace σ with s—sample standard deviation: − µ X − ≤ ≤ 0.95= ( 1.96 P 1.96) s / n / s n Term is called the SAMPLE STANDARD ERROR, denoted as SE( ). Thus 95% confidence interval for µ is : ≈ − ≤ µ ≤ + 0.95 P X ( 1.96 SE X ( ) X 1.96 SE X ( ))

Ex. 27 • Assuming that the n=92 students were a simple random sample of all NYU students. The sample mean weight is x = 145.2 lbs. And sample standard deviation s=23.7. What’s the 95% confidence interval for NYU student mean weight?

Ex. 28 • Assuming that the n=92 students were a simple random sample of all NYU students. The sample mean weight is x = 145.2 lbs. And sample standard deviation s=23.7. What’s the 95% confidence interval for NYU student mean weight? • Answer: First of all the standard error is 23.7 SE X = ( ) =2.47 92 • 95% CI is ± x 1.96* SE X ( ) = ± 145.2 (1.96)(2.47) = ± 145.2 4.8 lbs =[140.4 lbs, 150 lbs]

Interval estimation for an arbitrary level of confidence 1- α 29 Z α /2 is Z critical value for Area 1- α level α /2 ≤ = α P Z ( z α ) / 2 /2 In particular Area α /2 Area α /2 ≤ = P Z ( z ) 0.025 0.025 z α z − α /2 1 /2 EXCEL: NORM.INV(0.025, 0,1)=-1.96 NORM.INV(probability, mean, sd)= critical value

Z critical value table 30 1- α 0.80 0.90 0.95 0.99 α 0.20 0.10 0.05 0.01 α /2 0.10 0.05 0.025 0.005 Z 1- α /2 1.28 1.64 1.96 2.58 E.G. For 0.99 level of confidence interval, go out 2.58 standard errors from sample mean.

Beyond descriptive statistics 2 When we have a data set, we usually - PowerPoint PPT Presentation

Beyond descriptive statistics 2 When we have a data set, we usually want to do more with the data than just describe them Keep in mind that data are information of a sample selected or generated from a population, and our goal is to make

48-175 Descriptive Geometry Basic Concepts of Descriptive Geometry Descriptive geometry is

Descriptive Statistics Descriptive and Inferential Statistics Recall that statistical methods are

Descriptive statistics P RACTICIN G S TATIS TICS IN TERVIEW QUES TION S IN R Zuzanna

I t Introduction to d t i t Descriptive Descriptive Statistics Statistics 17.871 Spring

Descriptive Epidem iology & Descriptive Epidem iology & Study design Study design

Descriptive Complexity of Jonni Virtema Deterministic Polylogarithmic Time Descriptive

Statistics and Data Analysis Descriptive Statistics (2): Summarization Ling-Chieh Kung

Descriptive Statistics DS GA 1002 Probability and Statistics for Data Science

1 Practical Information 2 Introduction to Statistics Per Bruun Brockhoff 3 Descriptive Statistics:

Introduction to Data Science CS 5963 / Math 3900 Lecture 2: Introduction to Descriptive

Descriptive Statistics Stephen E. Brock, Ph.D., NCSP California State University, Sacramento 1

Descriptive Statistics Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Marc

Descriptive Statistics and Probability: A Look at Real- World

Trademark and Unfair Competition Law Slides 22: Descriptive and Nominative Fair Use LAWS 7341-001

Descriptive combinatorics and ergodic theorems Anush Tserunyan University of Illinois at

Agenda for today 1. Descriptive Data Analysis 2. Graphics XploRe Descriptive Data Analysis 1-2

Interval cancers Intrinsic subtypes Interval cancers arise symptomatically between screening

CONFLICT OF INTEREST DISCLOSURE I have no potential conflict of interest to report Can and should

Medical visual information retrieval Henning Mller HES-SO//Valais Sierre, Switzerland

Time accelerated P E R F O R M 6 0 P E R F O R M 6 0 P E R F O R M 6 0 P E R F O R M 6 0 F

Lung Cancer Updates 2018 Edward S. Kim, M.D., FACP Chair, Solid Tumor Oncology and

61A Lecture 30 Announcements Data Processing Data Processing 4 Data Processing Many data sets

Outline Return-oriented programming (ROP) CSci 5271 Announcements Introduction to Computer

When Oblivious is Not: Attacks against OPAM WOOT20@USENIX-SECURITY Nirjhar Roy (Indian

Sambuz

Useful Links

Newsletter

Mail Us

Beyond descriptive statistics 2 When we have a data set, we usually - PowerPoint PPT Presentation

Beyond descriptive statistics 2 When we have a data set, we usually want to do more with the data than just describe them Keep in mind that data are information of a sample selected or generated from a population, and our goal is to make

48-175 Descriptive Geometry Basic Concepts of Descriptive Geometry Descriptive geometry is

Descriptive Statistics Descriptive and Inferential Statistics Recall that statistical methods are

Descriptive statistics P RACTICIN G S TATIS TICS IN TERVIEW QUES TION S IN R Zuzanna

I t Introduction to d t i t Descriptive Descriptive Statistics Statistics 17.871 Spring

Descriptive Epidem iology &amp; Descriptive Epidem iology &amp; Study design Study design

Descriptive Complexity of Jonni Virtema Deterministic Polylogarithmic Time Descriptive

Statistics and Data Analysis Descriptive Statistics (2): Summarization Ling-Chieh Kung

Descriptive Statistics DS GA 1002 Probability and Statistics for Data Science

1 Practical Information 2 Introduction to Statistics Per Bruun Brockhoff 3 Descriptive Statistics:

Introduction to Data Science CS 5963 / Math 3900 Lecture 2: Introduction to Descriptive

Descriptive Statistics Stephen E. Brock, Ph.D., NCSP California State University, Sacramento 1

Descriptive Statistics Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Marc

Descriptive Statistics and Probability: A Look at Real- World

Trademark and Unfair Competition Law Slides 22: Descriptive and Nominative Fair Use LAWS 7341-001

Descriptive combinatorics and ergodic theorems Anush Tserunyan University of Illinois at

Agenda for today 1. Descriptive Data Analysis 2. Graphics XploRe Descriptive Data Analysis 1-2

Interval cancers Intrinsic subtypes Interval cancers arise symptomatically between screening

CONFLICT OF INTEREST DISCLOSURE I have no potential conflict of interest to report Can and should

Medical visual information retrieval Henning Mller HES-SO//Valais Sierre, Switzerland

Time accelerated P E R F O R M 6 0 P E R F O R M 6 0 P E R F O R M 6 0 P E R F O R M 6 0 F

Lung Cancer Updates 2018 Edward S. Kim, M.D., FACP Chair, Solid Tumor Oncology and

61A Lecture 30 Announcements Data Processing Data Processing 4 Data Processing Many data sets

Outline Return-oriented programming (ROP) CSci 5271 Announcements Introduction to Computer

When Oblivious is Not: Attacks against OPAM WOOT20@USENIX-SECURITY Nirjhar Roy (Indian

Sambuz

Useful Links

Newsletter

Mail Us

Descriptive Epidem iology & Descriptive Epidem iology & Study design Study design