Biostatistics Preparatory Course: Methods and Computing Lecture 6 - PowerPoint PPT Presentation

Biostatistics Preparatory Course: Methods and Computing Lecture 6 Simulations Methods and Computing Harvard University Department of Biostatistics 1 / 15

Recap / Warm-up: Linear Regression In the group exercise 2, we were given the following model: E [ Y | X 1 , X 2 ] = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 1 X 2 where Y was birthweight, X 1 was smoking status, and X 2 was mother’s weight gain. Why might β 3 be of interest? Methods and Computing Harvard University Department of Biostatistics 2 / 15

Recap / Warm-up: Linear Regression In the group exercise 2, we were given the following model: E [ Y | X 1 , X 2 ] = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 1 X 2 where Y was birthweight, X 1 was smoking status, and X 2 was mother’s weight gain. Why might β 3 be of interest? If you believe that the effect of mother’s weight gain varies within levels of smoking status What are the interpretations of β 1 and β 2 ? Methods and Computing Harvard University Department of Biostatistics 2 / 15

Recap / Warm-up: Linear Regression In the group exercise 2, we were given the following model: E [ Y | X 1 , X 2 ] = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 1 X 2 where Y was birthweight, X 1 was smoking status, and X 2 was mother’s weight gain. Why might β 3 be of interest? If you believe that the effect of mother’s weight gain varies within levels of smoking status What are the interpretations of β 1 and β 2 ? The mean change in birthweight comparing smokers to non-smokers among mother’s who did not gain weight The mean change in birthweight corresponding to a one unit change in mother’s weight gain among non-smokers Methods and Computing Harvard University Department of Biostatistics 2 / 15

Recap / Warm-up: Linear Regression E [ Y | X 1 = 1 , X 2 ] = ˆ β 0 + ˆ β 1 + ˆ β 2 X 2 + ˆ β 3 X 2 E [ Y | X 1 = 0 , X 2 ] = ˆ β 0 + ˆ β 2 X 2 Methods and Computing Harvard University Department of Biostatistics 3 / 15

Simulations studies 1 What is a simulation? Numerical technique to conduct experiments on a computer In statistics, we typically care about ‘Monte Carlo’ (MC) simulations which involve random sampling from probability distributions 2 Why bother? When developing a new method, it is important to establish its properties so that it can be used in practice Case I : Analytical derivations of properties are not always possible It is often feasible to obtain large sample approximations, but evaluation of the approximation in finite samples is necessary Case II : If you can derive analytic results, they usually require assumptions What are the properties of the method when various conditions are violated? Methods and Computing Harvard University Department of Biostatistics 4 / 15

Important terms An unbiased estimator for some parameter means that the expected value of the estimator is equal to the parameter A confidence interval has nominal coverage if it covers the true value of the parameter the correct proportion of times The size of a hypothesis test is equal to the probability of rejecting the null hypothesis given that the null is true The power of a hypothesis test is equal to the probability of rejecting the null hypothesis given that the null is false Methods and Computing Harvard University Department of Biostatistics 5 / 15

MC Simulations: The usual questions Under what conditions is an estimator unbiased? ex. Suppose the data is generated according to y ∼ β 0 + β 1 x 1 + β 2 x 2 + ǫ but I fit the model y = α 0 + α 1 x 1 . When is ˆ β 1 unbiased for α 1 ? Methods and Computing Harvard University Department of Biostatistics 6 / 15

MC Simulations: The usual questions Under what conditions is an estimator unbiased? ex. Suppose the data is generated according to y ∼ β 0 + β 1 x 1 + β 2 x 2 + ǫ but I fit the model y = α 0 + α 1 x 1 . When is ˆ β 1 unbiased for α 1 ? How does the estimator compare to other estimators? What is its sampling variability? ex. Suppose the data is generated according to y ∼ α 0 + α 1 x 1 + α 2 x 2 + ǫ with E ( ǫ ) = 0 and Var ( ǫ ) = σ 2 I . How do the OLS estimators compare to � n � n y )( x ∗ x ∗ ) i = 1 ( y i − ¯ i − ¯ i = 1 y i / x i n α 1 = ˆ and α 0 = ˆ − ˆ α 1 � n � n � n x )( x ∗ i = 1 ( x i − ¯ i − ¯ x ∗ ) i = 1 1 / x i i = 1 1 / x i x ∗ is mean of x ∗ where ¯ i = 1 / x i ? Methods and Computing Harvard University Department of Biostatistics 6 / 15

MC Simulations: The usual questions Does a confidence interval procedure attain nominal coverage? ex. The sum of n independent Bernoulli trials with common success probability is distributed according to Bin ( n , π ) π = X The MLE for π is ˆ n where X is the observed number of successes The Wald 95% Confidence Interval for π is given by: � � � � π ( 1 − ˆ ˆ π ) ˆ π ( 1 − ˆ π ) π − z 0 . 975 ˆ , ˆ π + z 0 . 975 n n The Score 95% Confidence Interval for π is given by: z 2 � n � � � + 1 0 . 975 ˆ ± π n + z 2 n + z 2 2 0 . 975 0 . 975 � 1 � � 1 � z 2 1 � � n � � � �� 0 . 975 z 0 . 975 ˆ π ( 1 − ˆ π ) + n + z 2 n + z 2 n + z 2 2 2 0 . 975 0 . 975 0 . 975 How does the coverage compare for both intervals as we increase n and vary p ? Methods and Computing Harvard University Department of Biostatistics 7 / 15 How does Monte Carlo simulation help to answer these questions?

MC Simulations: The usual questions Does a hypothesis testing procedure achieve the specified size? If so, what is the power like? How does it compare to alternative procedures? ex. Consider the one sample t -test for H 0 : µ = 0 vs. H A : µ � = 0 How does the power vary when the data is generated under some alternative hypothesis µ � = µ 0 ? How does Monte Carlo simulation help to answer these questions? Methods and Computing Harvard University Department of Biostatistics 8 / 15

MC Simulations: Intuition An estimator/test statistic has a true sampling distribution under some set of conditions We’d like to know the true sampling distribution so we can answer the questions on the previous slide but... The (finite sample) derivation is difficult 1 and/or We’d like to see how well the method holds up when assumptions are 2 violated So, we approximate the sampling distribution of an estimator/test statistic under a particular set of conditions through simulation Methods and Computing Harvard University Department of Biostatistics 9 / 15

How to Approximate the Sampling Distribution Generate B independent data sets according to the data generating process Compute the value of the estimator/test statistic T ( data ) for each data set → { T 1 , . . . , T B } If b is large enough, summary statistics using { T 1 , . . . , T b } should be good approximations to the true sampling properties of the estimator/test statistic under the specified conditions ex. T b is the value of T from the b th data set, b = 1 , . . . , B The empirical mean computed with the B data sets is an estimate of the true mean of the sampling distribution of the estimator The empirical standard error computed with the B data sets is an estimate of the true standard deviation of the sampling distribution of the estimator Methods and Computing Harvard University Department of Biostatistics 10 / 15

How can you assess the following? An unbiased estimator for some parameter means that the expected value of the estimator is equal to the parameter Methods and Computing Harvard University Department of Biostatistics 11 / 15

How can you assess the following? An unbiased estimator for some parameter means that the expected value of the estimator is equal to the parameter In numerous samples generated from the truth, take the mean of the estimated parameters. Is it close to the true value of the parameter? Methods and Computing Harvard University Department of Biostatistics 11 / 15

How can you assess the following? An unbiased estimator for some parameter means that the expected value of the estimator is equal to the parameter In numerous samples generated from the truth, take the mean of the estimated parameters. Is it close to the true value of the parameter? A confidence interval has nominal coverage if it covers the true value of the parameter the correct proportion of times Methods and Computing Harvard University Department of Biostatistics 11 / 15

How can you assess the following? An unbiased estimator for some parameter means that the expected value of the estimator is equal to the parameter In numerous samples generated from the truth, take the mean of the estimated parameters. Is it close to the true value of the parameter? A confidence interval has nominal coverage if it covers the true value of the parameter the correct proportion of times In numerous samples generated from the truth, calculate the confidence interval, how often does it cover the true value of the parameter? Methods and Computing Harvard University Department of Biostatistics 11 / 15

Biostatistics Preparatory Course: Methods and Computing Lecture 6 - PowerPoint PPT Presentation

Biostatistics Preparatory Course: Methods and Computing Lecture 6 Simulations Methods and Computing Harvard University Department of Biostatistics 1 / 15 Recap / Warm-up: Linear Regression In the group exercise 2, we were given the following

Biostatistics Preparatory Course: Methods and Computing Lecture 9 Maximum Likelihood & the

Launching the preparatory process for the 2019 Sector Ministers Meeting Preparatory webinar

PMMVY Preparatory PMMVY Preparatory Conference Pradhan Mantri Matru Vandana Yojana (PMMVY)

Comparison of Preparatory Signal Comparison of Preparatory Signal Detection Techniques for

Preparatory Course in Computer programming experience Science Computer Science 1 : Theoretical

Integration Philipp Warode October 2, 2019 Mathematics Preparatory Course 2019 Philipp

Biostatistics Burkhardt Seifert & Alois Tschopp Department of Biostatistics Epidemiology,

Lecture 1 : Introduction to Statistical Computing Biostatistics 615/815 - Statistical Computing .

Lecture 1 : Introduction to Statistical Computing Biostatistics 615/815 - Statistical Computing .

Course Orientation q Course Description q Course Outcomes q Course Requirements q Course Outline

Chemistry 51-P Data The Benefits of a Preparatory Course Course Results Post-Survey Results

ANKARA YILDIRIM BEYAZIT UNIVERSITY PREPARATORY SCHOOL ORIENTATION PRESENTATION CONTENTS 1.

Biostatistics and Design Core up to 2016 Andrea J Cook, PhD Senior Investigator Biostatistics

Biostatistics 602 - Statistical Inference March 14th, 2013 Biostatistics 602 - Lecture 16 Hyun

Biostatistics 602 - Statistical Inference April 16th, 2013 Biostatistics 602 - Lecture 24 Hyun

Biostatistics 602 - Statistical Inference March 19th, 2013 Biostatistics 602 - Lecture 16 Hyun

Machine Learning 2007: Lecture 3 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website:

Strategic Classification with Crowdsourcing Yang Liu ( joint work with Yiling Chen)

IN IN LI LINE E AN AND BAR D BAR GRA GRAPH PHS: S: UNDE DERES RESTIMA TIMATION, TION,

Lecture 2: Gradient Estimators CSC 2547 Spring 2018 David Duvenaud Based mainly on slides by Will

Fu Func nctio tions ns on t on the he La Latt ttic ice Huey-Wen Lin University of

Pileup Systematic Studies in The Fermilab Muon g-2 Experiment Meghna Bhattacharya University of

Cross Section Uncertainties in the NOvA Oscillation Analyses Aaron Mislivec University of

CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 2 Instructor: Yizhou Sun

Biostatistics Preparatory Course: Methods and Computing Lecture 6 - PowerPoint PPT Presentation

Biostatistics Preparatory Course: Methods and Computing Lecture 6 Simulations Methods and Computing Harvard University Department of Biostatistics 1 / 15 Recap / Warm-up: Linear Regression In the group exercise 2, we were given the following

Biostatistics Preparatory Course: Methods and Computing Lecture 9 Maximum Likelihood &amp; the

Launching the preparatory process for the 2019 Sector Ministers Meeting Preparatory webinar

PMMVY Preparatory PMMVY Preparatory Conference Pradhan Mantri Matru Vandana Yojana (PMMVY)

Comparison of Preparatory Signal Comparison of Preparatory Signal Detection Techniques for

Preparatory Course in Computer programming experience Science Computer Science 1 : Theoretical

Integration Philipp Warode October 2, 2019 Mathematics Preparatory Course 2019 Philipp

Biostatistics Burkhardt Seifert &amp; Alois Tschopp Department of Biostatistics Epidemiology,

Lecture 1 : Introduction to Statistical Computing Biostatistics 615/815 - Statistical Computing .

Lecture 1 : Introduction to Statistical Computing Biostatistics 615/815 - Statistical Computing .

Course Orientation q Course Description q Course Outcomes q Course Requirements q Course Outline

Chemistry 51-P Data The Benefits of a Preparatory Course Course Results Post-Survey Results

ANKARA YILDIRIM BEYAZIT UNIVERSITY PREPARATORY SCHOOL ORIENTATION PRESENTATION CONTENTS 1.

Biostatistics and Design Core up to 2016 Andrea J Cook, PhD Senior Investigator Biostatistics

Biostatistics 602 - Statistical Inference March 14th, 2013 Biostatistics 602 - Lecture 16 Hyun

Biostatistics 602 - Statistical Inference April 16th, 2013 Biostatistics 602 - Lecture 24 Hyun

Biostatistics 602 - Statistical Inference March 19th, 2013 Biostatistics 602 - Lecture 16 Hyun

Machine Learning 2007: Lecture 3 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website:

Strategic Classification with Crowdsourcing Yang Liu ( joint work with Yiling Chen)

IN IN LI LINE E AN AND BAR D BAR GRA GRAPH PHS: S: UNDE DERES RESTIMA TIMATION, TION,

Lecture 2: Gradient Estimators CSC 2547 Spring 2018 David Duvenaud Based mainly on slides by Will

Fu Func nctio tions ns on t on the he La Latt ttic ice Huey-Wen Lin University of

Pileup Systematic Studies in The Fermilab Muon g-2 Experiment Meghna Bhattacharya University of

Cross Section Uncertainties in the NOvA Oscillation Analyses Aaron Mislivec University of

CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 2 Instructor: Yizhou Sun

Biostatistics Preparatory Course: Methods and Computing Lecture 9 Maximum Likelihood & the

Biostatistics Burkhardt Seifert & Alois Tschopp Department of Biostatistics Epidemiology,