Practical data analysis Large Number Theorems Width of a - PowerPoint PPT Presentation

Practical data analysis References Variability Probability Distributions Practical data analysis Large Number Theorems Width of a distribution Doru Constantin and Guillaume Tresset Sampling Chi-squared doru.constantin@u-psud.fr distribution guillaume.tresset@u-psud.fr Errors Laboratoire de Physique des Solides, Orsay.

References I Practical data analysis ◮ Barlow, R. J. (1993). References Statistics: A Guide to the Use of Statistical Methods in the Physical Variability Sciences . Probability Chichester, England; New York: Wiley. Distributions Large Number ◮ Bevington, P. R. (1969). Theorems Data Reduction and Error Analysis for the Physical Sciences . Width of a distribution New York: McGraw-Hill. Sampling ◮ Bevington, P. R. and K. Robinson (2003). Chi-squared Data Reduction and Error Analysis for the Physical Sciences (3 ed.). distribution New York: McGraw-Hill. Errors ◮ Bohm, G. and G. Zech (2010). Introduction to Statistics and Data Analysis for Physicists . Hamburg: Verlag Deutsches Elektronen-Synchrotron. Freely available online from http://www-library.desy.de/preparch/books/ vstatmp_engl.pdf

References II Practical data analysis ◮ Drosg, M. (2009). References Dealing with Uncertainties (2 ed.). Variability Springer. Probability ◮ Feller, W. (1968). Distributions An Introduction to Probability Theory and Its Applications (3rd edition Large Number Theorems ed.). Width of a New York: Wiley. distribution Sampling ◮ Grinstead, C. M. and J. L. Snell (1997). Chi-squared Introduction to Probability (2 ed.). distribution American Mathematical Society. Errors Freely available online from http://www.dartmouth.edu/~chance/ ◮ Hughes, I. G. and T. P. A. Hase (2010). Measurements and their Uncertainties . Oxford: Oxford University Press. Short and very legible introduction.

References III Practical data analysis References Variability ◮ Jaynes, E. T. (2003). Probability Distributions Probability Theory – The Logic of Science . Large Number Cambridge: Cambridge University Press. Theorems Width of a ◮ Press, W. H., S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery distribution (1992). Sampling Numerical Recipes in C: The Art of Scientific Computing (2 ed.). Chi-squared distribution Cambridge: Cambridge University Press. Errors ◮ Taylor, J. R. (1997). An Introduction to Error Analysis (2 ed.). Sausalito: University Science Books.

Variability Practical data analysis References Variability Probability Distributions Large Number Theorems 1. When measuring the height of all adult males in a Width of a certain town, one finds 177 ± 5 cm. distribution Sampling 2. The charge of the electron is (1 . 602176565 ± 0 . 000000035) × 10 − 19 C. Chi-squared distribution Errors

The meaning of probability Practical data analysis References Variability Probability Distributions Casting a die: Large Number Theorems 1. Out of a large number of trials, each face will come Width of a distribution on top about 1 in 6 times. Sampling 2. Our state of knowledge gives us no reason to prefer Chi-squared one of the faces over the others. distribution Errors Each face has a 1 / 6 probability of coming up.

Random variables Practical data analysis ◮ A random variable “is simply an expression whose value is the outcome of a particular experiment” References Variability (Grinstead & Snell, 1997). It takes values in a certain Probability domain Ω . Distributions ◮ This domain (or sample space ) can be discrete, Large Number Ω = { ω 1 , ω 2 , . . . ω k , . . . } ⊂ Z n (finite or countably Theorems infinite) or continuous Ω ⊂ R n Width of a distribution ◮ The elements of the sample space ( ω k or x ∈ R n ) are Sampling called outcomes . Subsets of Ω are called events . Chi-squared distribution ◮ We introduce a probability distribution, Errors characterized by a distribution function m . In the discrete case, this function satisfies: m ( ω ) ≥ 0 , ∀ ω ∈ Ω � ω ∈ Ω m ( ω ) = 1 The probability of an event E is defined as : P ( E ) = � ω ∈ E m ( ω ).

Continuous distributions Practical data analysis Let X be a continuous real-valued random variable. A References density function for X is a function f : Ω → R such that Variability Probability � b Distributions P ( a ≤ X ≤ b ) = f ( x )d x , ∀ a , b ∈ R . Large Number a Theorems Width of a � distribution P ( X ∈ E ) = f ( x )d x . ∀ E ⊂ R Sampling E Chi-squared P ([ x , x + d x ]) = f ( x )d x distribution Errors f ( x )d x is the probability of the outcome x The cumulative distribution function of X is: � x d F ( x ) = P ( X ≤ x ) = f ( t )d t , with d xF ( x ) = f ( x ) −∞

Central tendency Practical data analysis References Variability Probability Distributions Large Number Theorems Width of a distribution Sampling Chi-squared distribution Errors Figure: Log-normal distribution with parameters µ = 0 and σ = 0 . 25 (solid line) and σ = 1 (dashed line). The mean (blue), median (green) and mode (red) are shown for both curves.

Spread Practical data analysis References IQR Variability Q1 Q3 Q1 − 1.5 × IQR Q3 + 1.5 × IQR Probability Distributions Median Large Number −4 σ −3 σ −2 σ −1 σ 0 σ 1 σ 2 σ 3 σ 4 σ Theorems −2.698 σ −0.6745 σ 0.6745 σ 2.698 σ Width of a distribution Sampling Chi-squared 24.65% 50% 24.65% distribution −4 σ −3 σ −2 σ −1 σ 0 σ 1 σ 2 σ 3 σ 4 σ Errors 15.73% 68.27% 15.73% −4 σ −1 σ 1 σ 4 σ −3 σ −2 σ 0 σ 2 σ 3 σ Figure: Boxplot details

Higher-order moments Practical data analysis � 3 � � 4 � �� X − µ �� X − µ References skewness ; γ 2 = − 3 γ 1 = kurtosis Variability σ σ Probability Distributions Large Number Theorems Width of a distribution Sampling Chi-squared distribution Errors Graphics by MarkSweep. Licensed under Public domain via Wikimedia Commons

Uniform Practical data analysis References ◮ All outcomes have equal Variability probability Probability f ( x ) Distributions  1 for x ∈ [ a , b ] 1  Large Number  b − a ◮ U ( x ; a , b ) =  b − a Theorems  0 otherwise   Width of a distribution ◮ µ = 1 2 ( a + b ) , m = 1 2 ( a + b ) x 0 a b Sampling M = any value in [ a , b ] . Chi-squared 1 distribution ◮ σ 2 = 1 F ( x ) 12 ( b − a ) 2 , γ 1 = 0 , γ 2 = − 6 / 5 Errors ◮ One cannot have a uniform distribution over an infinite domain (discrete or continuous)! a x 0 b Graphics by IkamusumeFan. Licensed under CCA-SA 3.0 via Wikimedia Commons _p / / 4 / / 4

Binomial Practical data analysis ◮ Number k of successes in a References p=0.5 and n=20 sequence of n independent p=0.7 and n=20 Variability p=0.5 and n=40 yes / no experiments (Bernoulli Probability trials), each of which yields Distributions success with probability p . Large Number Theorems ◮ B ( k ; n , p ) = C k n p k (1 − p ) n − k ; 0 10 20 30 40 Width of a k ∈ { 0 , 1 , . . . , n } distribution ◮ µ = np , m = � np � or � np � Sampling M = � ( n + 1) p � or � ( n + 1) p � − 1 . Chi-squared distribution Errors ◮ σ 2 = np (1 − p ) , γ 1 = 1 − 2 p γ 2 = 1 − 6 p (1 − p ) √ np (1 − p ) , np (1 − p ) ◮ k is the variable , n and p are parameters . Graphics by Tayste. Licensed under Public domain via Wikimedia Commons / /

Normal Practical data analysis References Variability 1.0 Probability ◮ Very widely encountered. μ = σ = 0, 2 0.2, μ = σ = 0, 2 1.0, 0.8 μ = σ = 2 0, 5.0, Distributions μ = σ = − 2, 2 0.5, 0.6 2 π e − ( x − µ )2 2 σ 2 ; Large Number 1 ◮ N ( x ; µ, σ ) = x ∈ R √ 0.4 Theorems σ 0.2 Width of a ◮ � X � = m = M = µ 0.0 distribution − 5 − 4 − 3 − 2 − 1 0 1 2 3 4 5 x � X 2 � = σ 2 , γ 1 = 0 , γ 2 = 0 1.0 Sampling μ = σ = 2 0, 0.2, μ = σ = 0, 2 1.0, 0.8 μ = 0, σ = 2 5.0, Chi-squared μ = σ = − 2, 2 0.5, distribution 0.6 0.4 Errors 0.2 0.0 − 5 − 4 − 3 − 2 − 1 0 1 2 3 4 5 x Graphics by Inductiveload. Licensed under Public domain via Wikimedia Commons i / / 4 i / / 4

Poisson Practical data analysis ◮ Probability of a given number of independent events k occurring References in a fixed interval with a known Variability average rate. Probability Distributions ◮ P ( k ; λ ) = λ k k ! e − λ ; k ∈ N , λ ∈ R + Large Number Theorems ◮ µ = λ, m ≃ ⌊ λ + 1 / 3 − 0 . 02 /λ ⌋ Width of a M = ⌈ λ ⌉ − 1 , ⌊ λ ⌋ distribution Sampling ◮ σ 2 = λ, γ 1 = λ − 1 / 2 , γ 2 = λ − 1 Chi-squared ◮ Can be seen as the limit of a distribution Errors binomial distribution for large n : P ( k ; λ = np ) ≃ B ( k ; n , p ) ◮ Approaches N for large λ : P ( k ; λ ) ≃ N ( x = k ; µ = λ, σ 2 = λ ) Graphics by Skbkekas. Licensed under CCA 3.0 via Wikimedia Commons

Lorentzian Practical data analysis References Variability ◮ Shape of resonance peaks. Also Probability named after Cauchy (in Distributions mathematics) and Breit and Large Number Theorems Wigner (in spectroscopy) Width of a 1 distribution ◮ L ( x ; x 0 , γ ) = � 2 � ; � � x − x 0 1 + πγ Sampling γ x ∈ R , x 0 ∈ R , γ ∈ R + Chi-squared distribution ◮ m = M = x 0 Errors ◮ No µ or higher moments! Graphics by Skbkekas. Licensed under CCA 3.0 via Wikimedia Commons

Practical data analysis Large Number Theorems Width of a - PowerPoint PPT Presentation

Practical data analysis References Variability Probability Distributions Practical data analysis Large Number Theorems Width of a distribution Doru Constantin and Guillaume Tresset Sampling Chi-squared doru.constantin@u-psud.fr

Practical Experience with Practical Experience with Practical Experience with Practical

Change from a Practical Perspective Change from a Practical Perspective Change from a Practical

Real-World applications of Boosting Yoav Freund UCSD Practical Advantages of AdaBoost

Practical Bioinformatics Mark Voorhies 5/15/2015 Mark Voorhies Practical Bioinformatics

CSpace CSpace CSpace CSpace A More Practical and A More Practical and A

ARDUINO & ELECTRONICS PRACTICAL PRACTICAL SESSION 1 Part of SmartProducts ARDUINO &

Data and Analysis Part V Statistical Analysis of Data Alex Simpson Part V: Statistical Analysis

Practical R: Data Ingestion and Munging Practical R: Data Ingestion and Munging Abhijit Dasgupta

Practical Bioinformatics Mark Voorhies 5/21/2019 Mark Voorhies Practical Bioinformatics Change

EMIS/DS 1300: A Practical Introduction to Data Science Slides by Michael Hahsler Data + Science

The Air-Brake: A Practical Presentation of the Modern The Air-Brake: A Practical Presentation of

Practical Bioinformatics Mark Voorhies 4/16/2018 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 4/9/2018 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/12/2015 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 6/3/2013 Mark Voorhies Practical Bioinformatics

PRACTICAL CHURCH ENERGY ISSUES Rebecca Cadie, Architect ARPL Architects PRACTICAL APPLICATION

Statistical Modeling of SiPM Noise Sergey Vinogradov Lebedev Physical Institute of the Russian

EM Algorithm and Mixture Models Guojun Zhang University of Waterloo Unsupervised learning and

Information Geometry: Background and Applications in Machine Learning Giovanni Pistone

1. Foundations of Numerics from Advanced Mathematics 1. Foundations of Numerics from Advanced

Statistics in Cryptanalysis Subhabrata Samajder Indian Statistical Institute, Kolkata 24 th May,

PERCOLATION WITHOUT FKG VINCENT BEFFARA AND DAMIEN GAYET Abstract. We prove a Russo-Seymour-Welsh

Probability Distributions and Introduction to Statistical Inference BIO5312 FALL2017 STEPHANIE

Discrete Mathematics and Its Applications Lecture 5: Discrete Probability: Random Variables MING

Practical data analysis Large Number Theorems Width of a - PowerPoint PPT Presentation

Practical data analysis References Variability Probability Distributions Practical data analysis Large Number Theorems Width of a distribution Doru Constantin and Guillaume Tresset Sampling Chi-squared doru.constantin@u-psud.fr

Practical Experience with Practical Experience with Practical Experience with Practical

Change from a Practical Perspective Change from a Practical Perspective Change from a Practical

Real-World applications of Boosting Yoav Freund UCSD Practical Advantages of AdaBoost

Practical Bioinformatics Mark Voorhies 5/15/2015 Mark Voorhies Practical Bioinformatics

CSpace CSpace CSpace CSpace A More Practical and A More Practical and A

ARDUINO &amp; ELECTRONICS PRACTICAL PRACTICAL SESSION 1 Part of SmartProducts ARDUINO &amp;

Data and Analysis Part V Statistical Analysis of Data Alex Simpson Part V: Statistical Analysis

Practical R: Data Ingestion and Munging Practical R: Data Ingestion and Munging Abhijit Dasgupta

Practical Bioinformatics Mark Voorhies 5/21/2019 Mark Voorhies Practical Bioinformatics Change

EMIS/DS 1300: A Practical Introduction to Data Science Slides by Michael Hahsler Data + Science

The Air-Brake: A Practical Presentation of the Modern The Air-Brake: A Practical Presentation of

Practical Bioinformatics Mark Voorhies 4/16/2018 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 4/9/2018 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/12/2015 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 6/3/2013 Mark Voorhies Practical Bioinformatics

PRACTICAL CHURCH ENERGY ISSUES Rebecca Cadie, Architect ARPL Architects PRACTICAL APPLICATION

Statistical Modeling of SiPM Noise Sergey Vinogradov Lebedev Physical Institute of the Russian

EM Algorithm and Mixture Models Guojun Zhang University of Waterloo Unsupervised learning and

Information Geometry: Background and Applications in Machine Learning Giovanni Pistone

1. Foundations of Numerics from Advanced Mathematics 1. Foundations of Numerics from Advanced

Statistics in Cryptanalysis Subhabrata Samajder Indian Statistical Institute, Kolkata 24 th May,

PERCOLATION WITHOUT FKG VINCENT BEFFARA AND DAMIEN GAYET Abstract. We prove a Russo-Seymour-Welsh

Probability Distributions and Introduction to Statistical Inference BIO5312 FALL2017 STEPHANIE

Discrete Mathematics and Its Applications Lecture 5: Discrete Probability: Random Variables MING

ARDUINO & ELECTRONICS PRACTICAL PRACTICAL SESSION 1 Part of SmartProducts ARDUINO &