Lecture 3. Fitting Distributions to data - choice of a model. Igor - PowerPoint PPT Presentation

Lecture 3. Fitting Distributions to data - choice of a model. Igor Rychlik Chalmers Department of Mathematical Sciences Probability, Statistics and Risk, MVE300 • Chalmers • April 2013. Click on red text for extra material.

Random variables and cdf. Random variable is a numerical outcome X , say, of an experiment. To describe its properties one needs to find probability distribution F X ( x ). Three approaches will be discussed: I Use only the observed values of X (data) to model the variability of X , i.e. normalized histogram, empirical cdf, see Lecture 2. II Try to find the proper cdf by means of reasoning. For example a number of heads in 10 flips of a fair coin is Bin(10,1/2). III Assume that F X belongs to a class of distributions b + a Y , for example Y standard normal. Then choose values of parameters a , b that best ”fits” data.

Case II - Example: Let roll a fair die. Sample space S = { 1 , . . . , 6 } and let random variable K be the number shown. All results are equally probable hence p k = P( K = k ) = 1 / 6. In 1882, R. Wolf rolled a die n = 20 000 times and recorded the number of eyes shown Number of eyes k 1 2 3 4 5 6 Frequency n k 3407 3631 3176 2916 3448 3422 Was his die fair? The χ 2 test , proposed by Karl Pearson’ (1857-1936), can be used to investigate this issue.

Pearson’ χ 2 test: Hypothesis H 0 : We claim that P(“Experiment results in outcome k ”) = p k , k = 1 , . . . , r . In our example r = 6, p k = 1 / 6. Significance level α : Select the probability (risk) of rejecting a true hypothesis. Constant α is often chosen to be 0.05 or 0.01. Rejecting H 0 with a lower α indicates stronger evidence against H 0 . Data: In n experiments one observed n k times outcome k . Test: Estimate p k by p ∗ k = n k / n . Large distances p k − p ∗ k make hypothesis H 0 questionable. Pearson proposed to use the following statistics to measure the distance: r � r � ( n k − np k ) 2 ( p ∗ k − p k ) 2 � � Q = = n . (1) np k p k k =1 k =1

Details of the χ 2 test How large Q should be to reject the hypothesis? Reject H 0 if Q > χ 2 α ( f ), where f = r − 1. Further, in order to use the test, as a rule of thumb one should check that np k > 5 for all k . Example 1 For Wolf’s data Q is Q = 1 . 6280 + 26 . 5816 + 7 . 4261 + 52 . 2501 + 3 . 9445 + 2 . 3585 = 94 . 2 Since f = r − 1 = 5 and the quantile χ 2 0 . 05 ( f ) = 11 . 1, we have Q > χ 2 0 . 05 (5) which leads to rejection of the hypothesis of a fair dice . 1 Example 2 Are children birth months uniformly distributed? Data, Matlab code:. 1 Not rejecting the hypothesis does not mean that there is strong evidence that H 0 is true. It is recommendable to use the terminology “reject hypothesis H 0 ” or “not reject hypothesis H 0 ” but not to say “accept H 0 ”.

Case III - parametric approach to find F X . Parametric estimation procedure of F X contains three main steps: choice of a model ; finding the parameters ; analysis of error : ◮ Choose a model, i.e. select one of the standard distributions F ( x ) (normal, exponential, Binomial, Poisson ...). Next postulate that � x − b � F X ( x ) = F . a ◮ Find estimates ( a ∗ , b ∗ ) such that F n ( x ) ≈ F � ( x − b ∗ ) / a ∗ � ( F X ( x ) ≈ F n ( x )), here first method of moments to estimates parameters will be presented. Then more advanced and often more accurate maximum likelihood method will be presented on the next lecture.

Moments of a rv. - Law of Large Numbers (LLN) ◮ Let X 1 , . . . , X k be a sequence of iid variables all having the distribution F X ( x ). Let E[ X ] be a constant, called the expected value of X , � + ∞ � E[ X ] = xf X ( x ) d x , or E[ K ] = k p k −∞ k ◮ If the expected value of X exists and is finite then, as k increases (we are averaging more and more variables), the average 1 k ( X 1 + X 2 + · · · + X k ) ≈ E[ X ] with equality when k approaches infinity. ◮ Linearity property E[ a + b X + c Y ] = a + b E[ X ] + c E[ Y ]. Example 3

Other moments ◮ Let X i be iid all having the distribution F X ( x ). Let us also introduce constants called the moments of X , defined by � + ∞ x n f X ( x ) d x k n p k . � E[ X n ] = E[ K n ] = or −∞ k ◮ If E[ X n ] exists and is finite then, as k increases, the average 1 k ( X n 1 + X n 2 + · · · + X n k ) ≈ E[ X n ] . ◮ The same is valid for other functions of r.v.

Variance, Coefficient of variation ◮ The variance V[ X ] and coefficient of variation R[ X ] � V[ X ] V[ X ] = E[ X 2 ] − E[ X ] 2 , R[ X ] = E[ X ] . ◮ IF X , Y are independent then Example 4 V[ a + b X + c Y ] = b 2 V[ X ] + c 2 V[ Y ]. ◮ Note that V[ X ] ≥ 0. If V[ X ] = 0 then X is a constant.

Example: Expectations and variances. Example 5 Distribution Exp e tation V arian e Expected yearly wind energy production, on blackboard. Beta distribution, Beta ( a , b ) Binomial distribution, Bin ( n, p ) , k = 0 , 1 , . . . , n First su ess distribution Geometri distribution P oisson distribution, P o ( m ) Γ( a + b ) Γ( a )Γ( b ) x a − 1 (1 − x ) b − 1 , 0 < x < 1 a ab f ( x ) = a + b ( a + b ) 2 ( a + b +1) Exp onen tial distribution, Exp ( a ) � n p k (1 − p ) n − k p k = � np np (1 − p ) k Gamma distribution, Gamma ( a, b ) p k = p (1 − p ) k − 1 , 1 1 − p k = 1 , 2 , 3 , . . . p 2 p Gum b el distribution 1 − p 1 − p p k = p (1 − p ) k , k = 0 , 1 , 2 , . . . p 2 p Normal distribution, N ( m, σ 2 ) p k = e − m m k k ! , k = 0 , 1 , 2 , . . . m m F ( x ) = 1 − e − x/a , a 2 x ≥ 0 a Log-normal distribution, ln X ∈ N ( m, σ 2 ) b a Γ( a ) x a − 1 e − bx , a/b 2 f ( x ) = x ≥ 0 a/b Uniform distribution, U ( a, b ) F ( x ) = e − e − ( x − b ) /a , a 2 π 2 / 6 x ∈ R b + γa W eibull distribution 1 1 2 π e − ( x − m ) 2 / 2 σ 2 , f ( x ) = x ∈ R √ σ 2 m σ F ( x ) = Φ(( x − m ) /σ ) , x ∈ R e 2 m +2 σ 2 − e 2 m + σ 2 F ( x ) = Φ( ln x − m e m + σ 2 / 2 ) , x > 0 σ ( a − b ) 2 a + b f ( x ) = 1 / ( b − a ) , a ≤ x ≤ b 2 12 c F ( x ) = 1 − e − ( x − b a ) , x ≥ b b + a Γ(1 + 1 /c ) a 2 � Γ(1 + 2 c ) − Γ 2 (1 + 1 c ) �

Method of moments to fit cdf to data: ◮ When a cdf F X ( x ) is specified then one can computed the expected value, variance, coefficient of variation and other moments E[ X k ]. � x − b ◮ If cdf F X ( x ) = F � , i.e. depends on two parameters a , b then a also moments are function of the parameters. E[ X k ] = m k ( a , b ) ◮ LLN tells us that having independent observations x 1 , . . . , x n of X the average values n m k = 1 � x k i → E[ X k ] , ¯ as n → ∞ . n i =1 ◮ Methods of moments recommends to estimate the parameters a , b by a ∗ , b ∗ that solve the equation system m k ( a ∗ , b ∗ ) = ¯ m k , k = 1 , 2 .

Periods in days between serious earthquakes: Example 6 By experience we choose exponential family F X ( x ) = 1 − e − x / a . Since a = E[ X ] we choose a ∗ = ¯ x = 437 . 2 days. 25 1 20 0.9 0.8 15 0.7 0.6 0.5 10 0.4 0.3 5 0.2 0.1 0 0 0 500 1000 1500 2000 0 500 1000 1500 2000 Period (days) Period (days) Left figure - histogram of 62 observed times between earthquakes. Right figure - comparison of the fitted exponential cdf to the earthquake data compared with ecdf - we can see that the two distributions are very close. Is a = a ∗ , i.e. is error e = a − a ∗ = a − 437 . 2 = 0 ?

Example 7 Poisson cdf The following data set gives the number of killed drivers of motorcycles in Sweden 1990-1999: 39 30 28 38 27 29 38 33 33 36 . Assume that the number of killed drivers per year is modeled as a random variable K ∈ Po( m ) and that numbers of killed drivers during consecutive years, are independent and identically distributed. From the table we read that E[ K ] = m hence methods of moments recommends to estimate parameter m by the average number m ∗ = ¯ k , viz. m ∗ = (39 + . . . + 36) / 10 = 33 . 1. Is m = m ∗ , i.e. is error e = m − m ∗ = m − 33 . 1 = 0 ?

Gaussian model Example 8 Since V[ X ] = E[ X 2 ] − E[ X ] 2 LLN gives the following estimate of the variance � 2 n � n n n = 1 1 = 1 x ) 2 → V[ X ] , � � � s 2 x 2 i − x i ( x i − ¯ as n tends to infinity . n n n i =1 i =1 i =1 We proposed to model weight of newborn baby X by normal (Gaussian) cdf N ( m , σ 2 ). Since E[ X ] = m and V[ X ] = σ 2 hence the method of moments recommends to estimate m , σ 2 by m ∗ = ¯ x , ( σ 2 ) ∗ = s 2 n . For the data m ∗ = 3400 g, ( σ 2 ) ∗ = 570 2 , g 2 . Are m = m ∗ and σ 2 = s 2 n , i.e. are errors e 1 = m − m ∗ = m − 33 . 1 = 0 , e 2 = σ 2 − ( σ 2 ) ∗ = σ 2 − 570 2 = 0 ?

Weibull model For environmental variables often Weibull cdf fits well data. Suppose that � x � c � � F X ( x ) = 1 − exp − , a a is scale parameter, c shape parameter. Using the table we have that � Γ(1 + 2 / c ) − Γ(1 + 1 / c ) 2 E[ X ] = a Γ(1 + 1 / c ) , R[ X ] = . Γ(1 + 1 / c ) � s 2 Method of moments: estimate the coefficient of variation by n / ¯ x , solve numerically the second equation for c ∗ , see Table 4 on page 256, then a ∗ = ¯ x / Γ(1 + 1 / c ∗ ). Example 9 Fitting Weibull cdf to bearing lifetimes Example 10 Fitting Weibull cdf to wind speeds measurements

Lecture 3. Fitting Distributions to data - choice of a model. Igor - PowerPoint PPT Presentation

Lecture 3. Fitting Distributions to data - choice of a model. Igor Rychlik Chalmers Department of Mathematical Sciences Probability, Statistics and Risk, MVE300 Chalmers April 2013. Click on red text for extra material. Random

Track fitting, vertex fitting and Track fitting, vertex fitting and Track fitting, vertex fitting

Week 2 Video 5 Cross-Validation and Over-Fitting Over-Fitting Ive mentioned over-fitting a

Lecture 11 Fitting ARIMA Models 10/10/2018 1 Model Fitting Fitting ARIMA For an

Formal Modeling in Cognitive Science 1 Distributions Lecture 20: Joint, Marginal, and Conditional

Unit 1: Data Fitting Motivation Data fitting: Construct a continuous function that represents

Least Squares and Data Fitting Data fitting How do we best fit a set of data points? Linear

Fitting parametric distributions using R : the fitdistrplus package M. L. Delignette-Muller - CNRS

Functions and Data Fitting COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning

? ? ? ? Basic Charts Outline - Distributions & Histograms - Mean, Mode, Average - Chart

Fitting a Line, Residuals, and Correlation October 28, 2019 October 28, 2019 1 / 36 Fitting a

Fitting a Line, Residuals, and Correlation August 27, 2019 August 27, 2019 1 / 54 Fitting a

Fitting a Model to Data Reading: 15.1, 15.5.2 Cluster image parts together by fitting a model

MOBILITY CHOICE STUDY MOBILITY CHOICE STUDY MOBILITY CHOICE STUDY Planning for Mobility in

Over fitting distribution functions over Bayesian Regression / " ' i diggllloise dist

Fitting high resolution structures into low resolution EM maps Michael Rossmann Purdue

Stat 5102 Lecture Slides: Deck 1 Empirical Distributions, Exact Sampling Distributions,

Statistics Point Estimation Shiu-Sheng Chen Department of Economics National Taiwan University

Estimating Frequency Moments of Streams In this class we will look at the two simple sketches for

II.2 Statistical Inference: Sampling and Estimation A statistical model is a set of

Evaluation INFM 718X/LBSC 718X Session 6 Douglas W. Oard Evaluation Criteria Effectiveness

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral

A general procedure to combine estimators Fr ed eric Lavancier Laboratoire de Math

Introduction to Moments Intermediate Portfolio Analysis in R Optimization Inputs Portfolio

CHAPTER 3: MIGRATION APHUG | BHS | Ms. Justice Key Question 3.3 Where do people migrate? Where

Lecture 3. Fitting Distributions to data - choice of a model. Igor - PowerPoint PPT Presentation

Lecture 3. Fitting Distributions to data - choice of a model. Igor Rychlik Chalmers Department of Mathematical Sciences Probability, Statistics and Risk, MVE300 Chalmers April 2013. Click on red text for extra material. Random

Track fitting, vertex fitting and Track fitting, vertex fitting and Track fitting, vertex fitting

Week 2 Video 5 Cross-Validation and Over-Fitting Over-Fitting Ive mentioned over-fitting a

Lecture 11 Fitting ARIMA Models 10/10/2018 1 Model Fitting Fitting ARIMA For an

Formal Modeling in Cognitive Science 1 Distributions Lecture 20: Joint, Marginal, and Conditional

Unit 1: Data Fitting Motivation Data fitting: Construct a continuous function that represents

Least Squares and Data Fitting Data fitting How do we best fit a set of data points? Linear

Fitting parametric distributions using R : the fitdistrplus package M. L. Delignette-Muller - CNRS

Functions and Data Fitting COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning

? ? ? ? Basic Charts Outline - Distributions &amp; Histograms - Mean, Mode, Average - Chart

Fitting a Line, Residuals, and Correlation October 28, 2019 October 28, 2019 1 / 36 Fitting a

Fitting a Line, Residuals, and Correlation August 27, 2019 August 27, 2019 1 / 54 Fitting a

Fitting a Model to Data Reading: 15.1, 15.5.2 Cluster image parts together by fitting a model

MOBILITY CHOICE STUDY MOBILITY CHOICE STUDY MOBILITY CHOICE STUDY Planning for Mobility in

Over fitting distribution functions over Bayesian Regression / &quot; ' i diggllloise dist

Fitting high resolution structures into low resolution EM maps Michael Rossmann Purdue

Stat 5102 Lecture Slides: Deck 1 Empirical Distributions, Exact Sampling Distributions,

Statistics Point Estimation Shiu-Sheng Chen Department of Economics National Taiwan University

Estimating Frequency Moments of Streams In this class we will look at the two simple sketches for

II.2 Statistical Inference: Sampling and Estimation A statistical model is a set of

Evaluation INFM 718X/LBSC 718X Session 6 Douglas W. Oard Evaluation Criteria Effectiveness

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral

A general procedure to combine estimators Fr ed eric Lavancier Laboratoire de Math

Introduction to Moments Intermediate Portfolio Analysis in R Optimization Inputs Portfolio

CHAPTER 3: MIGRATION APHUG | BHS | Ms. Justice Key Question 3.3 Where do people migrate? Where

? ? ? ? Basic Charts Outline - Distributions & Histograms - Mean, Mode, Average - Chart

Over fitting distribution functions over Bayesian Regression / " ' i diggllloise dist