Introduction Practicalities Review of basic ideas Peter Dalgaard - PowerPoint PPT Presentation

Introduction Practicalities Review of basic ideas Peter Dalgaard Department of Biostatistics University of Copenhagen April 2008

Overview ◮ Structure of the course ◮ The normal distribution ◮ t tests ◮ Determining the size of an investigation Written by Lene Theil Skovgaard (2007), Edited by Peter Dalgard (2008)

Aim of the course ◮ to enable the participants to ◮ understand and interpret statistical analyses ◮ evaluate the assumptions behind the use of various methods of analysis ◮ perform their own analyses using SAS ◮ understand the output from a statistical program package — in general; not only from SAS ◮ present the results from a statistical analysis — numerically and graphically ◮ to create a better platform for communication between statistics ‘users’ and statisticians, for the benefit of subsequent collaboration

Prerequisites We expect students to be ◮ Interested ◮ Motivated , ideally by your own research project, or by plans for carrying one out ◮ Basic knowledge of statistical concepts: ◮ mean, average ◮ variance, standard deviation, standard error of the mean ◮ estimation, confidence intervals ◮ regression (correlation) ◮ t test, χ 2 test

Literature ◮ D.G. Altman: Practical statistics for medical research. Chapman and Hall, 1991. ◮ P . Armitage, G. Berry & J.N.S Matthews: Statistical methods in medical research. Blackwell, 2002. ◮ Aa. T. Andersen, T.V. Bedsted, M. Feilberg, R.B. Jakobsen and A. Milhøj: Elementær indføring i SAS. Akademisk Forlag (in Danish, 2002) ◮ Aa. T. Andersen, M. Feilberg, R.B. Jakobsen and A. Milhøj: Statistik med SAS. Akademisk Forlag (in Danish, 2002)

◮ D. Kronborg og L.T. Skovgaard: Regressionsanalyse med anvendelser i lægevidenskabelig forskning. FADL (in Danish), 1990. ◮ R.P Cody og J.K. Smith: Applied statistics and the SAS programming language. 4th ed., Prentice-Hall, 1997.

Topics Quantitative data: Birth weight, blood pressure, etc. (normal distribution) ◮ Analysis of variance → variance component models ◮ Regression analysis ◮ The general linear model ◮ Non-linear models ◮ Repeated measurements over time Non-normal outcomes ◮ Binary data: logistic regression ◮ Counts: Poisson regression ◮ Ordinal data (maybe) ◮ (Censored data: survival analysis)

Lectures ◮ Tuesday and Thursday mornings (until 12.00) ◮ Lecturing in English ◮ Copies of slides must be downloaded ◮ Usually one large break starting around 10.15–10.30 and lasting about 25 minutes ◮ Coffee, tea, and cake will be served ◮ Smaller break later, if required

Computer labs ◮ 2 computer classes, A and B ◮ In the afternoon following each lecture ◮ Exercises will be handed out ◮ Two teachers in each exercise class ◮ We use SAS programming ◮ Solutions can be downloaded after the exercises

Course diploma ◮ 80% attendance is required ◮ It is your responsibility to sign the list at each lecture and each exercise class ◮ 8 × 2 = 16 lists, 80% equals 13 half days ◮ No compulsory home work . . . but you are expected to work with the material at home!

Example subject MF SV Two methods, expected to 1 47 43 give the same result: 2 66 70 3 68 72 ◮ MF : Transmitral 4 69 81 volumetric flow , 5 70 60 . . . determined by Doppler . . . echocardiography . . . . . . ◮ SV : Left ventricular 18 105 98 stroke volume , 19 112 108 20 120 131 determined by 21 132 131 cross-sectional average 86.05 85.81 SD 20.32 21.19 echocardiography SEM 4.43 4.62 How do we compare the two measurement methods?

The individuals are their own control We can obtain the same power with fewer individuals. A paired situation: Look at differences — but on which scale? ◮ Are the sizes of the differences approximately the same over the entire range? ◮ Or do we rather see relative (percent) differences? In that case, we take differences on a logarithmic scale. When we have determined the proper scale: Investigate whether the differences have mean zero.

Example Two methods for determining concentration of glucose. nr. REFE TEST REFE : 1 155 150 Colour test, 2 160 155 3 180 169 may be ’polluted’ by uric acid . . . . . . . . . TEST : 44 94 88 Enzymatic test, 45 111 102 46 210 188 more specific for glucose. ¯ X 144.1 134.2 SD 91.0 83.2 Ref: R.G. Miller et al. (eds): Biostatistics Casebook. Wiley, 1980.

Scatter plot: Limits of agreement: Since differences seem to be relative, we consider transformation by logarithm

Summary statistics Numerical description of quantitative variables ◮ Location, center y = 1 ◮ average (mean value) ¯ n ( y 1 + · · · + y n ) ◮ median (‘middle observation’) ◮ Variation 1 � ◮ variance, s 2 ( y i − ¯ y ) 2 y = n − 1 √ ◮ standard deviation, s y = variance ◮ special quantiles, e.g. quartiles

Summary statistics ◮ Average / Mean ◮ Median ◮ Variance ( quadratic units, hard to interpret ) ◮ Standard deviation ( units as outcome, interpretable ) ◮ Standard error ( uncertainty of estimate, e.g. mean ) The MEANS Procedure Variable N Mean Median Std Dev Std Error --------------------------------------------------------------------- mf 21 86.0476190 85.0000000 20.3211126 4.4344303 sv 21 85.8095238 82.0000000 21.1863613 4.6232431 dif 21 0.2380952 1.0000000 6.9635103 1.5195625 ---------------------------------------------------------------------

Interpretation of the standard deviation, s Most of the observations can be found in the interval ¯ y ± approx.2 × s i.e. the probability that a randomly chosen subject from a population has a value in this interval is large. . . For the differences mf - sv we find 0 . 24 ± 2 × 6 . 96 = ( − 13 . 68 , 14 . 16 ) If data are normally distributed , this interval contains approx. 95% of future observations. If not. . . In order to use the above interval, we should at least have reasonable symmetry. . .

Density of the normal distribution: N ( µ, σ 2 ) mean , Density often denoted µ , α etc. N � � 2 ( , ) 1 1 standard deviation , N � � 2 ( , ) 2 2 often denoted σ � � x � � � � � � � � 2 2 2 2 1 2 � � � � � � 1 1 1 1

Quantile plot (Probability plot) If data are normally distributed, the plot will look like a straight line: The observed quantiles should correspond to the theoretical ones (except for a scale factor)

Prediction intervals Intervals containing 95% of the ‘typical’ (middle) observations (95% coverage) : ◮ lower limit: 2.5%-quantile ◮ upper limit: 97.5%-quantile If a distribution fits well to a normal distribution N ( µ, σ 2 ) , then these quantiles can be directly calculated as follows: 2.5%-quantile: µ − 1 . 96 σ ≈ ¯ d − 1 . 96 s 97.5%-quantile: µ + 1 . 96 σ ≈ ¯ d + 1 . 96 s and the prediction interval is therefore calculated as ¯ y ± approx.2 × s = (¯ ¯ y − approx.2 × s , y + approx.2 × s )

What is the ‘approx. 2’? The prediction interval has to ‘ catch ’ future observations, y new We know that y ∼ N ( 0 , σ 2 ( 1 + 1 y new − ¯ n )) y new − ¯ y ∼ t ( n − 1 ) ⇒ � 1 + 1 s n t 2.5% ( n − 1 ) < y new − ¯ y < t 97.5% ( n − 1 ) � 1 + 1 s n � � 1 + 1 1 + 1 ¯ n × t 2.5% ( n − 1 ) < y new < ¯ y − s y + s n × t 97.5% ( n − 1 )

The meaning of ‘approx. 2’ is therefore � 1 + 1 n × t 97.5% ( n − 1 ) t 97.5% ( n − 1 ) ≈ The t quantiles ( t 2.5% = − t 97.5% ) may be looked up in tables, or calculated by, e.g., the program R : Free software, may be downloaded from http://cran.dk.r-project.org/

> df<-10:30 [10,] 19 2.093024 > qt<-qt(0.975,df) [11,] 20 2.085963 > cbind(df,qt) [12,] 21 2.079614 df qt [13,] 22 2.073873 [1,] 10 2.228139 [14,] 23 2.068658 [2,] 11 2.200985 [15,] 24 2.063899 [3,] 12 2.178813 [16,] 25 2.059539 [4,] 13 2.160369 [17,] 26 2.055529 [5,] 14 2.144787 [18,] 27 2.051831 [6,] 15 2.131450 [19,] 28 2.048407 [7,] 16 2.119905 [20,] 29 2.045230 [8,] 17 2.109816 [21,] 30 2.042272 [9,] 18 2.100922 For the differences mf - sv , n = 21, and the relevant t-quantile is 2.086, and the correct prediction interval is � 1 + 1 0 . 24 ± 2 . 086 × 21 × 6 . 96 = 0 . 24 ± 2 . 185 × 6 . 96 = ( − 14 . 97 , 15 . 45 )

To sum up: Statistical model for paired data: X i : MF-method for the i th subject Y i : SV-method for i th subject Differences D i = X i − Y i (i=1, . . . ,21) are independent, normally distributed D i ∼ N ( δ, σ 2 D ) Note: No assumptions about the distribution of the basic flow measurements !

Estimation Estimated mean (estimate of δ is denoted ˆ δ , ’delta-hat’): δ = ¯ ˆ d = 0 . 24cm 3 σ D = 6 . 96cm 3 s D = ˜ ◮ The estimate is our best guess , but uncertainty (biological variation) might as well have given us a somewhat different result ◮ The estimate has a distribution , with an uncertainty called the standard error of the estimate.

Central limit theorem (CLT) The average, ¯ y is ’much more normal’ than the original observations SEM, standard error of the mean SEM = 6 . 96 = 1.52 cm 3 √ 21

Introduction Practicalities Review of basic ideas Peter Dalgaard - PowerPoint PPT Presentation

Introduction Practicalities Review of basic ideas Peter Dalgaard Department of Biostatistics University of Copenhagen April 2008 Overview Structure of the course The normal distribution t tests Determining the size of an

Applied Machine Learning Introduction 1 APPLIED MACHINE LEARNING Practicalities Contact

welcome to data structures and algorithms data structures and algorithms 2020 08 31 lecture 1

Helen Tracey Overview Practicalities Eligibility Notification Agreement Examples During

Social History of Ideas Social History of Ideas Historians have a rich appreciation of ideas

Blast summary Blast summary Basic ideas: Basic ideas: Alignment (global/local/affine

www.UNHistory.org www.UNHistory.org The Power of Ideas The Power of Ideas UNIHP Book Series

Innovative Ideas to Engage Agents Will Bickmore & Sarah-Lynne Rand Senior Account Managers

Project Ideas Semester long projects of medium scope TAs presenting project ideas today

CS449/649: Human-Computer Interaction Winter 2018 Lecture VII Anastasia Kuzminykh Create

CS449/649: Human-Computer Interaction Spring 2017 Lecture VII Anastasia Kuzminykh Create

Applied Machine Learning Introduction 1 APPLIED MACHINE LEARNING Practicalities Slides and

Stochastic Simulation Introduction Bo Friis Nielsen Applied Mathematics and Computer Science

Understanding causation: the practicalities Jackie Chappell and Aaron Sloman Centre for

Practicalities of blood glucose monitoring with multiple dose injections Bethany Kelly

Tandem Mass Spectrometry: Practicalities and troubleshooting Sarah Montague and Dipti Seekun

Modern Systems for Neural Networks Valentin Dalibard This talk 1.Practicalities of training

Pairw ise Variability Index: Variability Index: Pairw ise Evaluating the Cognitive Evaluating

Reactive Vega A Streaming Dataflow Architecture for Declarative Interactive Visualization Arvind

2/14/2014 Distribution of Ischemic Stroke Subtypes J. Donald Easton, MD Clinical Professor of

Indexing Makes Your Book Perfect (and also, the common way of arranging words in Japanese) K e

CARS & BATTERIES 1 HOW MANY CARS? 2 World energy, technology, and climate outlook, 2003 HOW

3.9 GHz components design Speaker: Nikolay Solyak (from behalf of LCLS-II design team) 3.9GHz

CTTI Advancing the Use of Central IRBs Project: Academic Institution and Government Sponsor

2013 Update in Diagnosis and Management of Stroke S. Andrew Josephson MD Carmen Castron

Introduction Practicalities Review of basic ideas Peter Dalgaard - PowerPoint PPT Presentation

Introduction Practicalities Review of basic ideas Peter Dalgaard Department of Biostatistics University of Copenhagen April 2008 Overview Structure of the course The normal distribution t tests Determining the size of an

Applied Machine Learning Introduction 1 APPLIED MACHINE LEARNING Practicalities Contact

welcome to data structures and algorithms data structures and algorithms 2020 08 31 lecture 1

Helen Tracey Overview Practicalities Eligibility Notification Agreement Examples During

Social History of Ideas Social History of Ideas Historians have a rich appreciation of ideas

Blast summary Blast summary Basic ideas: Basic ideas: Alignment (global/local/affine

www.UNHistory.org www.UNHistory.org The Power of Ideas The Power of Ideas UNIHP Book Series

Innovative Ideas to Engage Agents Will Bickmore &amp; Sarah-Lynne Rand Senior Account Managers

Project Ideas Semester long projects of medium scope TAs presenting project ideas today

CS449/649: Human-Computer Interaction Winter 2018 Lecture VII Anastasia Kuzminykh Create

CS449/649: Human-Computer Interaction Spring 2017 Lecture VII Anastasia Kuzminykh Create

Applied Machine Learning Introduction 1 APPLIED MACHINE LEARNING Practicalities Slides and

Stochastic Simulation Introduction Bo Friis Nielsen Applied Mathematics and Computer Science

Understanding causation: the practicalities Jackie Chappell and Aaron Sloman Centre for

Practicalities of blood glucose monitoring with multiple dose injections Bethany Kelly

Tandem Mass Spectrometry: Practicalities and troubleshooting Sarah Montague and Dipti Seekun

Modern Systems for Neural Networks Valentin Dalibard This talk 1.Practicalities of training

Pairw ise Variability Index: Variability Index: Pairw ise Evaluating the Cognitive Evaluating

Reactive Vega A Streaming Dataflow Architecture for Declarative Interactive Visualization Arvind

2/14/2014 Distribution of Ischemic Stroke Subtypes J. Donald Easton, MD Clinical Professor of

Indexing Makes Your Book Perfect (and also, the common way of arranging words in Japanese) K e

CARS &amp; BATTERIES 1 HOW MANY CARS? 2 World energy, technology, and climate outlook, 2003 HOW

3.9 GHz components design Speaker: Nikolay Solyak (from behalf of LCLS-II design team) 3.9GHz

CTTI Advancing the Use of Central IRBs Project: Academic Institution and Government Sponsor

2013 Update in Diagnosis and Management of Stroke S. Andrew Josephson MD Carmen Castron

Innovative Ideas to Engage Agents Will Bickmore & Sarah-Lynne Rand Senior Account Managers

CARS & BATTERIES 1 HOW MANY CARS? 2 World energy, technology, and climate outlook, 2003 HOW