ST519/807: Mathematical Statistics Bent Jrgensen University of - PDF document

ST519/807: Mathematical Statistics Bent Jørgensen University of Southern Denmark November 22, 2014 Abstract These are additional notes on the course material for ST519/807: Mathematical Statis- tics. HMC refers to the textbook Hogg et al. (2013). Key words: Asymptotic theory, consistency, Cramér-Rao inequality, e¢ciency, exponential family, estimation, Fisher’s scoring method, Fisher information, identi…ability, likelihood, maximum likelihood, observed information, orthogonality, parameter, score function, statistical model, statistical test, su¢ciency. Fisher (1922), under the heading "The Neglect of Theoretical Statistics", wrote: Several reasons have contributed to the prolonged neglect into which the study of statistics, in its theoretical aspects, has fallen. In spite of the immense amount of fruitful labour which has been expended in its practical application, the basic principles of this organ of science are still in a state of obscurity, and it cannot be denied that, during the recent rapid development of practical methods, fundamental problems have been ignored and fundamental paradoxes left unresolved. Fisher then went on to introduce the main ingredients of likelihood theory, which shaped much of mathematical statistics of the 20th Century, including concepts such as statistical model, parameter, identi…ability, estimation, consistency, likelihood, score function, maximum likelihood, Fisher information, e¢ciency, and su¢ciency. Here we review the basic elements of likelihood theory in a contemporary setting. Prerequisites: Sample space; probability distribution; discrete and continuous random variables; PMF and PDF; transformations; independent random variables; mean, variance, co- variance and correlation. Special distributions: Uniform; Bernoulli; binomial; Poisson; geometric; negative binomial; gamma; chi-square; beta; normal; t -distribution; F -distribution. Contents 1 Stochastic convergence and the Central Limit Theorem 3 2 The log likelihood function and its derivatives 8 2.1 Likelihood and log likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 The score function and the Fisher information function . . . . . . . . . . . . . . . 11 2.3 Observed information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.4 The Cramér-Rao inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1

3 Asymptotic likelihood theory 18 3.1 Asymptotic normality of the score function . . . . . . . . . . . . . . . . . . . . . 18 3.2 The maximum likelihood estimator . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.3 Exponential families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.4 Consistency of the maximum likelihood estimator . . . . . . . . . . . . . . . . . . 25 3.5 E¢ciency and asymptotic normality . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.6 The Weibull distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.7 Location models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4 Vector parameters 30 4.1 The score vector and the Fisher information matrix . . . . . . . . . . . . . . . . . 30 4.2 Cramér-Rao inequality (generalized) . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.3 Consistency and asymptotic normality of the maximum likelihood estimator . . . 32 4.4 Parameter orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.5 Exponential dispersion models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.6 Linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5 Su¢ciency 39 5.1 De…nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.2 The Fisher-Neyman factorization criterion . . . . . . . . . . . . . . . . . . . . . . 41 5.3 The Rao–Blackwell theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.4 The Lehmann-Sche¤é theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 6 The likelihood ratio test and other large-sample tests 48 6.1 Standard errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 6.2 The likelihood ratio test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 6.3 Wald and score tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 7 Maximum likelihood computation 51 7.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 7.2 Stabilized Newton methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 7.3 The Newton-Raphson method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 7.4 Fisher’s scoring method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 7.5 Step length calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 7.6 Convergence and starting values . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 2

1 Stochastic convergence and the Central Limit Theorem � Setup: Let X denote a random variable (r.v.) and let f X n g 1 n =1 denote a sequence of r.v.s., all de…ned on a suitable probability space ( C ; B ; P ) (sample space, � -algebra, probability measure). � De…nition: Convergence in probability. We say that P X n ! X as n ! 1 ( X n converges to X in probability) if n !1 P ( j X n � X j � " ) = 0 8 " > 0 lim � De…nition: Convergence in distribution. If F is a distribution function (CDF) we say that D X n ! F as n ! 1 ( X n converges to F in distribution) if P ( X n � x ) ! F ( x ) as n ! 1 for all x 2 C ( F ) where C ( F ) denotes the set of continuity points of F . If X has distribution function F , we also write D X n ! X as n ! 1 � Properties: As n ! 1 P P 1. X n ! X ) aX n ! aX ! X ) g ( X n ) P P 2. X n ! g ( X ) if g is continuous P D 3. X n ! X ) X n ! X P P 4. If X n ! X and Y n ! Y then P P X n + Y n ! X + Y and X n Y n ! XY (1.1) � Example: Let X be symmetric, i.e. � X � X , and de…ne X n = ( � 1) n X Then D X n ! X (meaning that X n converges to the distribution of X ), since F X n = F X for all n , but unless X n is constant, X n 9 X in probability 3

� However, we have the following properties D P 1. X n ! c ) X n ! c D P D 2. X n ! X and Y n ! 0 then X n + Y n ! X ! X ) g ( X n ) D D 3. X n ! g ( X ) if g is continuous D P P 4. Slutsky’s Theorem: If X n ! X and A n ! a , B n ! b then P A n + B n X n ! a + bX D D � Example: Let X n and Y n be two sequences such that X n ! X and Y n ! X The following examples show that we do not in general have a result similar to (1.1) for convergence in distribution. 1. Suppose that X is symmetric (see above), and let X n = X and Y n = � X for all n . Then X n + Y n = X � X = 0 so clearly X n + Y n converges i distribution to 0 as n ! 1 . 2. Now suppose that for each n , X n and Y n are independent and identically distributed with CDF F ( x ) = P ( X < x ) for all x . Now D X n + Y n ! F X 1 + Y 1 where F X 1 + Y 1 ( x ) = P ( X 1 + Y 1 � x ) for all x , corresponding to the convolution of X 1 D D and Y 1 : Hence, the assumption that X n ! X and Y n ! X is not enough to determine the limiting distribution of X n + Y n , which in fact depends on the sequence of joint distribution of X n and Y n . � Statistical setup : Let X 1 ; X 2 ; : : : be a sequence of i.i.d. variables. Assume � 2 = Var( X i ) � = E( X i ) and De…ne for n = 1 ; 2 ; : : : X n X n = 1 X i and � T n = nT n i =1 Then X n ) = � 2 E( � Var( � X n ) = � n 4

� The (Weak) Law of Large Numbers (LLN) says P � X n ! � Proof: Use Chebyshev’s inequality �� 2 =n � X n � � � � " P ! 0 as n ! 1 " 2 � Convergence to the standard normal distribution P ( X n � x ) ! �( x ) as n ! 1 ; for all x 2 R , where Z x e � 1 2 t 2 dt: �( x ) = (2 � ) � 1 = 2 �1 � Now we de…ne Z n = p n ( � X n � � ) for which Var( Z n ) = � 2 E( Z n ) = 0 � The Central Limit Theorem (CLT) (see James, p. 265 or HMC p. 307) says D ! N(0 ; � 2 ) Z n as n ! 1 Practical use p n ( � X n � � ) � N(0 ; � 2 ) approx. which implies � � �; � 2 � X n � N approx. n Rule: The approximate normal distribution shares with � X n its mean and variance. Example Bernoulli trials. Assume that the X i are i.i.d. Bernoulli variables, P ( X i = 1) = � = 1 � P ( X i = 0) Hence we use � as probability parameter, which is also the mean of X i , � 2 = Var( X i ) = � (1 � � ) � = E( X i ) and Then n X T n = X i = # of 1 s in a sample of n i =1 In fact T n � Bi( n; � ) (binomial distribution). Then, by the LLN P � X n ! � 5

ST519/807: Mathematical Statistics Bent Jrgensen University of - PDF document

ST519/807: Mathematical Statistics Bent Jrgensen University of Southern Denmark November 22, 2014 Abstract These are additional notes on the course material for ST519/807: Mathematical Statis- tics. HMC refers to the textbook Hogg et al.

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Regulatory Update Mike Randall mike.randall@ncdenr.gov (919) 807 6374 Common Plan of

Karen | 082 807 9334 karen@storysa.co.za www.storysa.co.za Company Profile Story is owned by

FY 2012 Results Presentation March 6, 2013 0 FY 2012 Key Facts Sales: Euro 807.6 million

OmniUpdate Training Tuesday Archiving your RSS Feeds Zoom Event ID: 807 220 379 Audio will be

TUESDAY 10.4 Updates WebEx Event # 807 378 505 Audio will be heard on your computer speakers.

Areal statistics Barry Rowlingson Research Fellow DataCamp Spatial Statistics in R Borders

Mathematical Induction COMPSCI 230 Discrete Math March 26, 2015 COMPSCI 230 Discrete

Mathematical String Notation 7 January 2019 OSU CSE 1 String Theory A mathematical model

Slide 1 Page: 1 Mathematical Tasks.ppt Effective Mathematics Instruction: The Role of

Mathematical Set Notation 8 February 2019 OSU CSE 1 Set Theory A mathematical model that

The Pulse monitors: Statistics Smartpods PULSE 1 - Improve Facility Efficiencies 2 - Increase

Quality Assurance in Official Statistics Directorate of Economics & Statistics, Planning

UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics

The Statistics Network The Statistics Network Statistics network Compute servers Desktop PCs

1 Practical Information 2 Introduction to Statistics Per Bruun Brockhoff 3 Descriptive Statistics:

On the Admissibility of a Polish Group Topology Gianluca Paolini (joint work with Saharon Shelah)

District 56 In Memoriam District 56 | Houston, TX USA In Memoriam Rest in Peace, Ken Slack.

ERASING THE STIGMA MENTAL ILLNESS IN TECH J.D. Flynn, Drupal Developer, Karaoke Enthusiast

Inspektor Gadget and traceloop Tracing containers syscalls using BPF FOSDEM | 1 Feb 2020

On the Admissibility of a Polish Group Topology and Other Things Gianluca Paolini (joint work

On the Admissibility of a Polish Group Topology and some Results in Reconstruction Theory

Astro 6: class 8 Wednesday, November 20, 2013 interior angles of a triangle...180 degrees?

Heavy ion physics at LHCb Jiayin Sun (Tsinghua University) On behalf of the LHCb collaboration

ST519/807: Mathematical Statistics Bent Jrgensen University of - PDF document

ST519/807: Mathematical Statistics Bent Jrgensen University of Southern Denmark November 22, 2014 Abstract These are additional notes on the course material for ST519/807: Mathematical Statis- tics. HMC refers to the textbook Hogg et al.

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Regulatory Update Mike Randall mike.randall@ncdenr.gov (919) 807 6374 Common Plan of

Karen | 082 807 9334 karen@storysa.co.za www.storysa.co.za Company Profile Story is owned by

FY 2012 Results Presentation March 6, 2013 0 FY 2012 Key Facts Sales: Euro 807.6 million

OmniUpdate Training Tuesday Archiving your RSS Feeds Zoom Event ID: 807 220 379 Audio will be

TUESDAY 10.4 Updates WebEx Event # 807 378 505 Audio will be heard on your computer speakers.

Areal statistics Barry Rowlingson Research Fellow DataCamp Spatial Statistics in R Borders

Mathematical Induction COMPSCI 230 Discrete Math March 26, 2015 COMPSCI 230 Discrete

Mathematical String Notation 7 January 2019 OSU CSE 1 String Theory A mathematical model

Slide 1 Page: 1 Mathematical Tasks.ppt Effective Mathematics Instruction: The Role of

Mathematical Set Notation 8 February 2019 OSU CSE 1 Set Theory A mathematical model that

The Pulse monitors: Statistics Smartpods PULSE 1 - Improve Facility Efficiencies 2 - Increase

Quality Assurance in Official Statistics Directorate of Economics &amp; Statistics, Planning

UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics

The Statistics Network The Statistics Network Statistics network Compute servers Desktop PCs

1 Practical Information 2 Introduction to Statistics Per Bruun Brockhoff 3 Descriptive Statistics:

On the Admissibility of a Polish Group Topology Gianluca Paolini (joint work with Saharon Shelah)

District 56 In Memoriam District 56 | Houston, TX USA In Memoriam Rest in Peace, Ken Slack.

ERASING THE STIGMA MENTAL ILLNESS IN TECH J.D. Flynn, Drupal Developer, Karaoke Enthusiast

Inspektor Gadget and traceloop Tracing containers syscalls using BPF FOSDEM | 1 Feb 2020

On the Admissibility of a Polish Group Topology and Other Things Gianluca Paolini (joint work

On the Admissibility of a Polish Group Topology and some Results in Reconstruction Theory

Astro 6: class 8 Wednesday, November 20, 2013 interior angles of a triangle...180 degrees?

Heavy ion physics at LHCb Jiayin Sun (Tsinghua University) On behalf of the LHCb collaboration

Quality Assurance in Official Statistics Directorate of Economics & Statistics, Planning