Sta$s$cs & Experimental Design with R Barbara - PowerPoint PPT Presentation

Sta$s$cs ¡& ¡Experimental ¡Design ¡ with ¡R ¡ Barbara ¡Kitchenham ¡ Keele ¡University ¡ 1 ¡

Basic ¡Sta$s$cal ¡Theory ¡ Part ¡2 ¡ 2 ¡

Probability ¡Distribu$ons ¡ • Frequency ¡func$on ¡ ¡ – Also ¡called ¡probability ¡density ¡func$on ¡for ¡ con$nuous ¡variables ¡ – Integral ¡referred ¡to ¡as ¡“cumula$ve ¡ distribu$on ¡func$on” ¡ ¡ • Three ¡proper$es: ¡ ¡ 3 ¡

Normal ¡(Gaussian) ¡Distribu$on ¡ • Probability ¡distribu$on ¡x~N(μ,σ 2 ) ¡ ¡ • Any ¡normal ¡distribu$on ¡can ¡be ¡ standardized , ¡ ¡z~N(0,1) ¡leWng ¡ • Always ¡symmetric ¡about ¡mean ¡(μ) ¡ – P{-‑σ<x< ¡σ)~0.68 ¡ – P{-‑2σ<x<2σ)~0.95 ¡ 4 ¡ ¡

Normal ¡distribu$on ¡ 0.3 Density 0.2 0.1 -3 -2 -1 0 1 2 3 Normal Deviate 5 ¡

Moments ¡ • Moments ¡– ¡a ¡measure ¡of ¡the ¡shape ¡of ¡a ¡set ¡of ¡ points ¡ ¡ – Moments ¡about ¡origin ¡ – Moments ¡about ¡mean ¡ – μ ¡ ¡& ¡σ 2 ¡define ¡the ¡Normal ¡distribu$on ¡ – Third ¡(& ¡odd>3) ¡moments ¡about ¡mean ¡(skewness) ¡=0 ¡ ¡ ¡ – Fourth ¡moment ¡about ¡mean ¡(kurstosis)=3 ¡ 6 ¡

= Expecta$ons ¡– ¡Fuc$ons ¡of ¡Variables ¡ • Expected ¡value ¡of ¡a ¡func$on ¡ h(x) ¡of ¡random ¡ variable ¡ x ¡is ¡defined ¡as: ¡ ¡ – Provide ¡a ¡precise ¡defini$on ¡of ¡important ¡quan$$es ¡ – Provide ¡link ¡between ¡samples ¡and ¡popula$ons ¡ • If ¡ h(x) = x , ¡E[ x ]= ¡μ x ¡ • Arithme$c ¡transforma$ons ¡of ¡func$ons ¡of ¡ random ¡variables ¡easy ¡to ¡handle ¡ – E[b+c x ]= ¡b+cμ x ¡ – E[ x 1 + x 2 + x 3 +…]= ¡ Σμ i ¡ 7 ¡

Expecta$ons ¡of ¡Variance ¡ • Expected ¡value ¡of ¡var ¡x ¡= ¡ ¡ • For ¡the ¡sum ¡or ¡difference ¡of ¡two ¡variables ¡ – If ¡x ¡and ¡y ¡are ¡independent ¡ • Arithme$c ¡transforma$ons ¡are ¡allowed ¡ 8 ¡

Proper$es ¡of ¡Normal ¡Variables ¡ • If{X 1 ,…,X n } ¡are ¡a ¡set ¡of ¡independent, ¡iden$cally ¡ distributed ¡Normal ¡variables ¡of ¡size ¡ n ¡ • Each ¡with ¡mean=μ ¡and ¡variance ¡σ 2 ¡ • E[mean= ¡ΣX i /n ¡]= ¡μ ¡ • E[var ¡(ΣX i /n)]= ¡(Σσ 2 )/n 2 ¡ = ¡σ 2 /n ¡ • ΣX i /n ¡is ¡~N(μ,σ 2 /n) ¡ • Variance ¡of ¡{X 1 ,…,X n } ¡is ¡chi-‑squared ¡ distribu$on ¡with ¡ n ¡ degrees ¡of ¡freedom ¡ • Σ(X i -‑ ¡μ) 2 /n ¡~σ 2 χ 2 n /n ¡ • Expected ¡value ¡of ¡χ 2 n =n, ¡var(χ 2 n )=2n ¡ • Var(Σ(X i -‑ ¡μ) 2 /n ¡)= ¡2nσ 4 /n 2 =2σ 4 /n ¡ 9 ¡

Maximum ¡Likelihood ¡-‑1 ¡ • Generic ¡method ¡of ¡es$ma$ng ¡parameters ¡of ¡ a ¡distribu$on ¡ • Likelihood ¡func$on ¡(L) ¡ ¡ – Joint ¡distribu$on ¡of ¡elements ¡in ¡a ¡sample ¡given ¡ the ¡values ¡of ¡a ¡parameter ¡θ ¡ – Parameter ¡es$mated ¡by ¡ ¡ • Differen$a$ng ¡ ¡L ¡( ¡usually ¡ ¡Log(L)) ¡with ¡respect ¡to ¡θ, ¡ ¡ • Equa$ng ¡equa$on ¡deriva$ves ¡to ¡zero ¡ ¡ • Solving ¡equa$ons ¡ • Accept ¡solu$on ¡for ¡which ¡second ¡deriva$ve ¡is ¡nega$ve ¡ 10 ¡

Maximum ¡Likelihood ¡-‑2 ¡ L ¡is ¡like ¡Bayesian ¡model ¡with ¡no ¡Prior ¡ • ME ¡es$mate ¡of ¡sigma ¡is ¡biased ¡ • When ¡f(x) ¡Normal, ¡Log(L) ¡is ¡chi-‑squared ¡with ¡n ¡degrees ¡of ¡ • freedom ¡ Log(L) ¡is ¡used ¡in ¡many ¡sta$s$cal ¡tests ¡ • 11 ¡

Importance ¡of ¡Normal ¡Distribu$on ¡ • Law ¡of ¡large ¡numbers ¡ – The ¡ average ¡of ¡the ¡results ¡obtained ¡from ¡a ¡number ¡of ¡ “trials” ¡ • Should ¡be ¡close ¡to ¡expected ¡value ¡ • Becomes ¡closers ¡as ¡more ¡trials ¡are ¡performed ¡ • Central ¡limit ¡theorem ¡ – If{X 1 ,…,X n } ¡are ¡a ¡set ¡of ¡independent, ¡iden$cally ¡distributed ¡ variables ¡of ¡size ¡ n ¡ – S n = ¡ΣX i /n ¡is ¡approximately ¡~N(μ,σ 2 /n) ¡ – Irrespec5ve ¡of ¡distribu5on ¡of ¡X’s ¡ • Assuming ¡finite ¡X i ¡have ¡variances ¡ • Normal ¡distribu$on ¡assumed ¡to ¡occur ¡as ¡the ¡sum ¡of ¡ ¡ many ¡small ¡independent ¡effects ¡ 12 ¡

Implica$ons ¡ • Classical ¡methods ¡ – With ¡large ¡enough ¡sample ¡size, ¡can ¡assume ¡ the ¡ mean ¡of ¡a ¡sample ¡is ¡Normally ¡distributed ¡ • Can ¡use ¡proper$es ¡of ¡Normal ¡distribu$on ¡ – E.g. ¡Standard ¡unit ¡distribu$on ¡can ¡be ¡used ¡to ¡construct ¡ confidence ¡intervals ¡ – An ¡immense ¡body ¡of ¡sta$s$cal ¡methods ¡ available ¡if ¡parameters/data ¡are ¡normal ¡ – Many ¡guidelines ¡for ¡transforming ¡the ¡data ¡to ¡ increase ¡Normality ¡ ¡ ¡ 13 ¡

Normal ¡approxima$ons ¡ • Binomial ¡Distribu$on ¡ • Probability ¡of ¡ x ¡successes ¡in ¡ n ¡trials ¡ – p ¡is ¡probability ¡of ¡success ¡for ¡a ¡specific ¡trial ¡ – Expected ¡value ¡of ¡ p ¡is ¡ ¡ – Expected ¡variance ¡of ¡ p ¡is ¡ • Approximately ¡Normal ¡ – If ¡ n ¡large ¡(>30) ¡ – p ¡not ¡too ¡far ¡from ¡0.5 ¡ – Confidence ¡intervals ¡for ¡ x ¡or ¡ p ¡based ¡on ¡Normal ¡ distribu$on ¡ – With ¡“correc$ons” ¡for ¡discrete ¡distribu$on ¡ 14 ¡

Confidence ¡Limits ¡of ¡Mean ¡ • Assume ¡random ¡sample ¡ • Mean ¡is ¡approximately ¡Normal ¡ ¡ – For ¡95% ¡confidence ¡intervals ¡ – For ¡unit ¡normal ¡deviate ¡ – For ¡random ¡sample, ¡confidence ¡limit ¡of ¡mean ¡ ¡ 15 ¡ ¡

Confidence ¡Limits ¡of ¡Differences ¡ • Independent ¡random ¡samples ¡from ¡two ¡ groups, ¡want ¡to ¡inves$gate ¡ ¡ • Assuming ¡variance ¡same ¡in ¡each ¡group ¡ 16 ¡

Student’s ¡ t ¡Distribu$on ¡ • Provide ¡means ¡of ¡correc$ng ¡for ¡small ¡ samples ¡ – When ¡es$mates ¡are ¡less ¡reliable ¡(e.g. ¡<30 ¡per ¡ group) ¡ – Degrees ¡of ¡freedom ¡= ¡ n-‑1 ¡ – Confidence ¡limits ¡found ¡as ¡usual ¡(assuming ¡α ¡ level) ¡ 17 ¡

Approxima$ons ¡&Transforma$ons ¡ • Pearson ¡correla$on ¡coefficient ¡ • Associa$on ¡between ¡two ¡variables ¡(x,y) ¡ (measured ¡on ¡same ¡item) ¡ • For ¡large ¡n>100 ¡ • For ¡small ¡n, ¡use ¡Normal ¡transforma$on ¡ 18 ¡

Problem ¡ • How ¡large ¡a ¡sample ¡is ¡needed ¡for ¡good ¡Normal ¡ approxima$on? ¡ – 30+? ¡ ¡Point ¡where ¡“t” ¡distribu$on ¡and ¡Normal ¡distribu$on ¡ converge ¡ • Systema$c ¡studies ¡of ¡Non-‑normality ¡ – “Heavy” ¡tails ¡(i.e. ¡many ¡outliers) ¡but ¡symmetric ¡ – Skewed ¡but ¡“light-‑tailed” ¡ – Heavy-‑tailed ¡and ¡skewed ¡ • Show ¡classical ¡methods ¡more ¡vulnerable ¡than ¡expected ¡ – For ¡skewed ¡distribu$ons ¡the ¡mean ¡may ¡be ¡far ¡from ¡“typical” ¡ – Heavy-‑tails ¡increase ¡the ¡variance ¡ • Making ¡it ¡possible ¡to ¡miss ¡true ¡effects ¡ • Also ¡tests ¡for ¡non-‑Normality ¡have ¡ low ¡power ¡ – They ¡are ¡vulnerable ¡to ¡Type ¡2 ¡Error s ¡ 19 ¡

The ¡Workshop ¡Approach ¡ • We ¡have ¡reviewed ¡ ¡some ¡important ¡classic ¡ techniques ¡ • But ¡ – Will ¡con$nue ¡to ¡concentrate ¡on ¡conven$onal ¡ approaches ¡ – But ¡will ¡introduce ¡some ¡new ¡approaches ¡ • Par$cularly ¡ones ¡that ¡let ¡you ¡visualise ¡your ¡data ¡ – Review ¡some ¡recent ¡approaches ¡to ¡robust ¡ analysis ¡ • However ¡from ¡now ¡approaches ¡will ¡be ¡ illustrated ¡with ¡SE ¡data ¡ 20 ¡

Sta$s$cs & Experimental Design with R Barbara - PowerPoint PPT Presentation

Sta$s$cs & Experimental Design with R Barbara Kitchenham Keele University 1 Basic Sta$s$cal Theory Part 2 2 Probability Distribu$ons Frequency

Sta$s$cs Sta$s$cs Fourth Dimension of a Sta$s$cal Programmer

Basic Experimental Design Basic Concepts in Experimental Design Prof. Dr. Luc Duchateau Ghent

F orwa rd L ooking Sta te me nt Ce rta in o f the sta te me nts ma de in this Pre se nta tio

Experimental Design and Probability Introduction to course Robin Elahi Experimental Design and

Experimental Design in R Kaelen Medeiros Product Data Scientist at DataCamp DataCamp

Sta$s$cs & Experimental Design with R Barbara Kitchenham

2011 11 12 12 th th at t Sta tate te (3:18.02) :18.02) 2012 12 10 10 th th at t

STA STA 2Q 2Q19 19 An Analyst lyst Pre Presentation entation 1 CO CONTENTS TENTS 1. .

STA STA 4Q 4Q19 19 & FY & FY19 19 An Analy lyst st Pre Presentat sentation ion

STA STA 1Q 1Q19 19 An Analyst lyst Pre Presentation entation 1 CO CONTENTS TENTS 1.

Open Water Swimming Speaker: Dave Candler, STA President Qualifications STA Level 1 Award for

STA STA 1Q 1Q20 20 Pr Prese esentation ntation Opportu ortunity nity Day 5 June e 2020

STA Graduation 2019/20 STA Graduation Application https://forms.gle/tZsKJXUmbAQgcSn57 This google

263-2810: Advanced Compiler Design 2.0 Sta>c Single Assignment Form Thomas R. Gross Computer

WHAT WOULD TREX DO? From Experimental Design to Analysis, the TREX Approach EXPERIMENTAL DESIGN

Experimental Design for Simulation Experimental Design for Simulation [Law, Ch. 12][Sanchez et al.

Lecture 1 Matthieu Bloch 1 Notation and basic definitions integers are denoted by R and N ,

Linear and Statistical Independence of Linear Approximations and their Correlations Kaisa Nyberg

1 A statistical definition of probability: frequentist 2 concepts: 1. Sam ple space , S , is the

Statistics and Data Analysis Introduction to Probability (1) Ling-Chieh Kung Department of

Pisa mai 2006 Nonstandard Averaging and Signal Processing E.Benot Universit de La Rochelle

Polynomial completeness properties Erhard Aichinger Department of Algebra Johannes Kepler

Properties of the automorphism group and a probabilistic construction of a class of countable

CS 225 Data Structures Oc October 31 He Heaps and Priority Qu Queues G G Carl Evans Ru

Sta$s$cs & Experimental Design with R Barbara - PowerPoint PPT Presentation

Sta$s$cs & Experimental Design with R Barbara Kitchenham Keele University 1 Basic Sta$s$cal Theory Part 2 2 Probability Distribu$ons Frequency

Sta$s$cs Sta$s$cs Fourth Dimension of a Sta$s$cal Programmer

Basic Experimental Design Basic Concepts in Experimental Design Prof. Dr. Luc Duchateau Ghent

F orwa rd L ooking Sta te me nt Ce rta in o f the sta te me nts ma de in this Pre se nta tio

Experimental Design and Probability Introduction to course Robin Elahi Experimental Design and

Experimental Design in R Kaelen Medeiros Product Data Scientist at DataCamp DataCamp

Sta$s$cs &amp; Experimental Design with R Barbara Kitchenham

2011 11 12 12 th th at t Sta tate te (3:18.02) :18.02) 2012 12 10 10 th th at t

STA STA 2Q 2Q19 19 An Analyst lyst Pre Presentation entation 1 CO CONTENTS TENTS 1. .

STA STA 4Q 4Q19 19 &amp; FY &amp; FY19 19 An Analy lyst st Pre Presentat sentation ion

STA STA 1Q 1Q19 19 An Analyst lyst Pre Presentation entation 1 CO CONTENTS TENTS 1.

Open Water Swimming Speaker: Dave Candler, STA President Qualifications STA Level 1 Award for

STA STA 1Q 1Q20 20 Pr Prese esentation ntation Opportu ortunity nity Day 5 June e 2020

STA Graduation 2019/20 STA Graduation Application https://forms.gle/tZsKJXUmbAQgcSn57 This google

263-2810: Advanced Compiler Design 2.0 Sta&gt;c Single Assignment Form Thomas R. Gross Computer

WHAT WOULD TREX DO? From Experimental Design to Analysis, the TREX Approach EXPERIMENTAL DESIGN

Experimental Design for Simulation Experimental Design for Simulation [Law, Ch. 12][Sanchez et al.

Lecture 1 Matthieu Bloch 1 Notation and basic definitions integers are denoted by R and N ,

Linear and Statistical Independence of Linear Approximations and their Correlations Kaisa Nyberg

1 A statistical definition of probability: frequentist 2 concepts: 1. Sam ple space , S , is the

Statistics and Data Analysis Introduction to Probability (1) Ling-Chieh Kung Department of

Pisa mai 2006 Nonstandard Averaging and Signal Processing E.Benot Universit de La Rochelle

Polynomial completeness properties Erhard Aichinger Department of Algebra Johannes Kepler

Properties of the automorphism group and a probabilistic construction of a class of countable

CS 225 Data Structures Oc October 31 He Heaps and Priority Qu Queues G G Carl Evans Ru

Sta$s$cs & Experimental Design with R Barbara Kitchenham

STA STA 4Q 4Q19 19 & FY & FY19 19 An Analy lyst st Pre Presentat sentation ion

263-2810: Advanced Compiler Design 2.0 Sta>c Single Assignment Form Thomas R. Gross Computer