Statistics and learning Statistical estimation Emmanuel Rachelson and Matthieu Vignes ISAE SupAero Wednesday 18 th September 2013 E. Rachelson & M. Vignes (ISAE) SAD 2013 1 / 17
How to retrieve ’lecture’ support & practical sessions LMS @ ISAE or My website (clickable links) E. Rachelson & M. Vignes (ISAE) SAD 2013 2 / 17
Things you have to keep in mind Crux of the estimation ◮ Population, sample and statistics. E. Rachelson & M. Vignes (ISAE) SAD 2013 3 / 17
Things you have to keep in mind Crux of the estimation ◮ Population, sample and statistics. ◮ Concept of estimator of a paramater. E. Rachelson & M. Vignes (ISAE) SAD 2013 3 / 17
Things you have to keep in mind Crux of the estimation ◮ Population, sample and statistics. ◮ Concept of estimator of a paramater. ◮ Bias, comparison of estimators, Maximum Likelihood Estimator. E. Rachelson & M. Vignes (ISAE) SAD 2013 3 / 17
Things you have to keep in mind Crux of the estimation ◮ Population, sample and statistics. ◮ Concept of estimator of a paramater. ◮ Bias, comparison of estimators, Maximum Likelihood Estimator. ◮ Sufficient statistics, quantiles. E. Rachelson & M. Vignes (ISAE) SAD 2013 3 / 17
Things you have to keep in mind Crux of the estimation ◮ Population, sample and statistics. ◮ Concept of estimator of a paramater. ◮ Bias, comparison of estimators, Maximum Likelihood Estimator. ◮ Sufficient statistics, quantiles. ◮ Interval estimation. E. Rachelson & M. Vignes (ISAE) SAD 2013 3 / 17
Statistical estimation Steps in estimation procedure: ◮ Consider a population (size N ) described by a random variable X (known or unknown distribution) with parameter θ , E. Rachelson & M. Vignes (ISAE) SAD 2013 4 / 17
Statistical estimation Steps in estimation procedure: ◮ Consider a population (size N ) described by a random variable X (known or unknown distribution) with parameter θ , ◮ a sample with n ≤ N independent observations ( x 1 . . . x n ) is extracted, E. Rachelson & M. Vignes (ISAE) SAD 2013 4 / 17
Statistical estimation Steps in estimation procedure: ◮ Consider a population (size N ) described by a random variable X (known or unknown distribution) with parameter θ , ◮ a sample with n ≤ N independent observations ( x 1 . . . x n ) is extracted, ◮ θ is estimated through a statistic (=function of X i ’s): ˆ θ = T ( X 1 . . . X n ) . E. Rachelson & M. Vignes (ISAE) SAD 2013 4 / 17
Statistical estimation Steps in estimation procedure: ◮ Consider a population (size N ) described by a random variable X (known or unknown distribution) with parameter θ , ◮ a sample with n ≤ N independent observations ( x 1 . . . x n ) is extracted, ◮ θ is estimated through a statistic (=function of X i ’s): ˆ θ = T ( X 1 . . . X n ) . Note: independence is true only if drawing is made with replacement. Without replacement, the approximation is ok if n << N . E. Rachelson & M. Vignes (ISAE) SAD 2013 4 / 17
Statistical estimation Steps in estimation procedure: ◮ Consider a population (size N ) described by a random variable X (known or unknown distribution) with parameter θ , ◮ a sample with n ≤ N independent observations ( x 1 . . . x n ) is extracted, ◮ θ is estimated through a statistic (=function of X i ’s): ˆ θ = T ( X 1 . . . X n ) . Note: independence is true only if drawing is made with replacement. Without replacement, the approximation is ok if n << N . Mean estimation Estimate the average life span of a bulb... E. Rachelson & M. Vignes (ISAE) SAD 2013 4 / 17
Point estimation of a parameter Recall n realisations of random variables iid ( X 1 . . . X n ) are available. Some parameters can be of interest. Direct computation not feasible so estimation needed. Objective here: tools and maths grounds for estimation. E. Rachelson & M. Vignes (ISAE) SAD 2013 5 / 17
Point estimation of a parameter Recall n realisations of random variables iid ( X 1 . . . X n ) are available. Some parameters can be of interest. Direct computation not feasible so estimation needed. Objective here: tools and maths grounds for estimation. Definitions ◮ Statistical model : definition of a probability distribution P θ (joint if discrete rv and density if continuous rv), with θ is a ( p -vector of) unknown parameter(s). ◮ Statistic : T : R n → R ( p ) , ( x i ) i =1 ...n �→ T ( x 1 . . . x n ) . Examples: empirical mean or variance (known/unknown mean). E. Rachelson & M. Vignes (ISAE) SAD 2013 5 / 17
Estimator, bias, comparison Exercice Lift can bear 1 , 000 kg . User weight ∼ N (75 , 16 2 ) . ◮ Max. number of people allowed in it if P ( lift won’t take off ) = 10 − 6 ? ◮ Lift manufacturer allows 11 people inside. P ( overweight ) =?? E. Rachelson & M. Vignes (ISAE) SAD 2013 6 / 17
Estimator, bias, comparison Exercice Lift can bear 1 , 000 kg . User weight ∼ N (75 , 16 2 ) . ◮ Max. number of people allowed in it if P ( lift won’t take off ) = 10 − 6 ? ◮ Lift manufacturer allows 11 people inside. P ( overweight ) =?? Definitions ◮ Estimator of an unknown parameter θ : a statistic denoted ˆ θ (observed values are approximations of θ ). The bias associated to ˆ θ is E [ˆ θ ] − θ (if = 0 , ˆ θ is said to be unbiased). Ex: (exercices) (i) the empirical mean is an unbiaised estimator for the (theoretical) mean. � n i =1 ( X i − ¯ X ) 2 (ii) S 2 is a biased estimator for σ 2 . n := n ◮ ˆ θ is asymptotically unbiased if lim n →∞ E [ˆ θ ] = θ . ◮ ˆ θ 1 and ˆ θ 2 : 2 unbiased estimator for θ ; ˆ θ 1 is better than ˆ θ 2 if V ar (ˆ θ 1 ) < V ar (ˆ θ 2 ) ; in practice, ˆ θ 1 converges faster than ˆ θ 2 . E. Rachelson & M. Vignes (ISAE) SAD 2013 6 / 17
Application break Estimating the duration of a traffic light θ > 0 is the actual duration of a traffic light. Unknown. We observe a sample t 1 . . . t n , where t i is the waiting time of driver i . 1. What is a good modelling for T i ’s ? Density ? Mean and variance ? � n 2. If T = 1 i =1 T i , what is E [ T ] ? var( T ) ? Can you use T to build n an unbiased estimator of θ ? Establish its probability convergence. 3. Let M n = sup i T i . Compute the cumulative probability function of M n ? Density ? Mean and variance ? Plot the cpf for n = 3 , n = 30 and interpret. Use M n to build an unbiased probability-convergent estimator of θ . 4. Compare the variances of both estimators. Which one would you use to estimate θ ? 5. Numerical application for n = 3 and ( t 1 , t 2 , t 3 ) = (2 , 24 , 13) . E. Rachelson & M. Vignes (ISAE) SAD 2013 7 / 17
Convergence of estimators Def: ˆ θ converges in probability towards θ if ∀ ǫ > 0 , P ( | ˆ θ − θ | < ǫ ) → n 1 . E. Rachelson & M. Vignes (ISAE) SAD 2013 8 / 17
Convergence of estimators Def: ˆ θ converges in probability towards θ if ∀ ǫ > 0 , P ( | ˆ θ − θ | < ǫ ) → n 1 . Theorem An (asymptotically) unbiased estimator s.t. lim n V ar (ˆ θ ) = 0 converges in probability towards θ . E. Rachelson & M. Vignes (ISAE) SAD 2013 8 / 17
Convergence of estimators Def: ˆ θ converges in probability towards θ if ∀ ǫ > 0 , P ( | ˆ θ − θ | < ǫ ) → n 1 . Theorem An (asymptotically) unbiased estimator s.t. lim n V ar (ˆ θ ) = 0 converges in probability towards θ . Theorem An unbiased estimator ˆ θ with the following technical regularity hypotheses (H1-H5) verifies V ar (ˆ θ ) > V n ( θ ) , with the Cramer-Rao bound V n ( θ ) := ( − E [ ∂ 2 log f ( X 1 ...X n ; θ ) ]) − 1 (inverse of Fisher information). ∂θ 2 (H1) the support D := { X, f ( x ; θ ) > 0 } does not depend upon θ . (H2) θ belongs to an open interval I . ∂θ and ∂ 2 f (H3) on I × D , ∂f ∂θ 2 exist and are integrable over x . � (H4) θ �→ A f ( x ; θ ) dx has a second order derivative ( x ∈ I, A ∈ B ( R )) (H5) ( ∂ log f ( X ; θ ) ) 2 is integrable. ∂θ E. Rachelson & M. Vignes (ISAE) SAD 2013 8 / 17
Application to the estimation of a |N| Definition An unbiased estimator ˆ σ for θ is efficient if its variance is equal to the Cramer-Rao bound. It is the best possible among unbiased estimators. E. Rachelson & M. Vignes (ISAE) SAD 2013 9 / 17
Application to the estimation of a |N| Definition An unbiased estimator ˆ σ for θ is efficient if its variance is equal to the Cramer-Rao bound. It is the best possible among unbiased estimators. Exercice Let ( X i ) i =1 ...n iid rv ∼ N ( m, σ 2 ) . Y i := | X i − m | is observed. ◮ Density of Y i ? Compute E [ Y i ] ? Interpretation compared to σ ? ◮ Let ˆ σ := � i a i Y i . If we want ˆ σ to be unbiased, give a constraint on ( a i ) ’s. Under this constraint, show that V ar (ˆ σ ) is minimum iif all a i are equal. In this case, give the variance. ◮ Compare the Cramer-Rao bound to the above variance. Is the built estimator efficient ? E. Rachelson & M. Vignes (ISAE) SAD 2013 9 / 17
Likelihood function Definition The likelihood of a rv X = ( X 1 . . . X n ) is the function L : L : R n × Θ − R + → f ( x ; θ ) , the density of X ( x, θ ) �− → L ( x ; θ ) := or P θ ( X 1 = x 1 . . . X n = x n ) , if X discrete E. Rachelson & M. Vignes (ISAE) SAD 2013 10 / 17
Recommend
More recommend