Mixtures of equispaced Normal distributions and their use for - PowerPoint PPT Presentation

Mixtures of equispaced Normal distributions and their use for testing symmetry in univariate data Silvia Bacci ∗ 1 , Francesco Bartolucci ∗ ∗ Dipartimento di Economia, Finanza e Statistica - Università di Perugia University of Naples “Federico II”, Naples, 17-19 May 2012 1 silvia.bacci@stat.unipg.it Bacci, Bartolucci (unipg) MMLV2012 1 / 23

Outline Introduction 1 The mixture-based test of symmetry 2 The NM model Maximum likelihood estimation Proposed test of symmetry Monte Carlo study 3 Main results Empirical example 4 Conclusions 5 References 6 Bacci, Bartolucci (unipg) MMLV2012 2 / 23

Introduction Starting point Let X 1 , X 2 , . . . , X n be a random sample from a continuous distribution F ( x ) with density f ( x ) Let µ be the mean or the median of f ( . ) Problem of testing symmetry: H 0 : F ( µ − x ) = 1 − F ( µ + x ) ∀ x against (hypothesis of skewness) H 1 : F ( µ − x ) � = 1 − F ( µ + x ) for at least one x Aim: to propose a test of symmetry based on Normal finite mixture (NM) models (Lindsay, 1996; McLachlan and Peel, 2000) Bacci, Bartolucci (unipg) MMLV2012 3 / 23

Introduction Why testing symmetry? many parametric statistical methods are robust to the violation of the normality assumption of f ( x ) , being the symmetry often sufficient for their validity knowledge about the symmetry of f ( x ) is relevant to choose which location parameter is more representative of the distribution, being mean, median, and mode not coincident in case of skewness in case-control studies the exchangeability is required for the joint distribution of observations of treated and controlled individuals: as exchangeability implies the symmetry of the distribution, knowing that a distribution is skewed allows to exclude its exchangeability nonparametric methods assume the symmetry of the distribution rather than its normality Bacci, Bartolucci (unipg) MMLV2012 4 / 23

Introduction How testing symmetry? Traditional test based on the third sample standardised moment (Gupta, 1967) b 1 = m 3 , m 3 / 2 2 where m r = 1 / n � n i = 1 ( x i − x ) r , r = 2 , 3 b 1 is commonly used to estimate the third standardised population moment γ 1 = µ 3 µ r = E [( X − µ ) r ] , µ 3 / 2 2 for samples from a symmetric distribution with finite sixth order central moment, σ 2 = µ 6 − 6 µ 2 µ 4 + 9 µ 3 b 1 → N ( 0 , σ 2 ) , 2 n µ 3 2 σ 2 is consistently estimated by substituting µ j , j = 2 , 4 , 6 , with the appropriate sample moments under H 0 , S 1 = n 1 / 2 b 1 → N ( 0 , 1 ) σ ˆ Bacci, Bartolucci (unipg) MMLV2012 5 / 23

Introduction Drawbacks of Gupta’s test γ 1 is sensitive to outliers γ 1 can be undefined for heavy-tailed distributions (e.g., Chauchy) γ 1 = 0 not necessarily means that f ( x ) is symmetric Other tests based on alternative measures of skewness Randles et al. (1980) for a triples test McWilliams (1990), Modarres and Gastwirth (1996) for a runs test Cabilio and Masaro (1996), Miao et al. (2006) for a test based on the Yule’s skewness index Mira (1999) for a test based on the Bonferroni’s index Non-parametric tests based on the kernel estimation method Fan and Gencay (1995), Ngatchou-Wandji (2006), Racine and Maasoumi (2007) pros: a better goodness of fit is allowed with respect to parametric methods cons: high number of unknown parameters Bacci, Bartolucci (unipg) MMLV2012 6 / 23

Introduction Our proposal We know that: NM densities (with common variance) allow to approximate arbitrarily well any continuous (symmetric or skewed) distribution NM densities provide a convenient semi-parametric framework in which to model unknown distributions, by keeping a parsimony close to that of full parametric methods as represented by a single density the flexibility of nonparametric methods as represented by the kernel method Therefore, we propose the use of NM densities for testing symmetry about an unknown value Bacci, Bartolucci (unipg) MMLV2012 7 / 23

The mixture-based test of symmetry The NM model The NM model Density of a mixture of k normal components (NM k ) k � π j φ ( x ; ν j , σ 2 ) , f ( x ) = j = 1 π j ( j = 1 , . . . , k ) denotes the weight of the j -th component ν j = α + βδ j ( j = 1 , . . . , k ) denotes the support points of the mixture α is the centre of symmetry β is a scale parameter δ 1 , . . . , δ k is a grid of equispaced points between − 1 and 1 Bacci, Bartolucci (unipg) MMLV2012 8 / 23

The mixture-based test of symmetry Maximum likelihood estimation Maximum likelihood estimation Log-likelihood of NM k n k � � π j φ ( x i ; ν j , σ 2 ) ℓ ( θ ) = log i = 1 j = 1 θ = ( α, β, π 1 , . . . , π k ) ℓ ( θ ) is maximised through an EM algorithm (Dempster et al., 1977) complete data log-likelihood n k � � � z ij log φ ( x i ; ν j , σ 2 ) + ℓ c ( θ ) = z · j log π j j i = 1 j = 1 z ij is a dummy variable equal to 1 if the i -th observation belongs to the j -th component and to 0 otherwise z · j = � i z ij Bacci, Bartolucci (unipg) MMLV2012 9 / 23

The mixture-based test of symmetry Maximum likelihood estimation EM algorithm Step E: compute the expected value of z ij , i = 1 , . . . , n and j = 1 , . . . , k , given the observed data x = ( x 1 , . . . , x n ) and the current value of the parameters θ φ ( x i ; ν j , σ 2 ) π j ˆ z ij = � h φ ( x i ; ν h , σ 2 ) π h Step M: maximise ℓ c ( θ ) with any z ij substituted by ˆ z ij . The solution is reached when: � � j z ij ( x i − ¯ x ) δ j i � x i / n ; ¯ � β = ; ¯ x = δ = z · j δ j / k j z · j ( δ j − ¯ � δ ) δ j i j x − β ¯ α = ¯ δ � � σ 2 z ij [ x i − ( α + βδ j )] 2 / n = i j ˆ z · j ˆ π j = j = 1 , . . . , k n Bacci, Bartolucci (unipg) MMLV2012 10 / 23

The mixture-based test of symmetry Maximum likelihood estimation Selection of k A crucial point with NM models concerns the choice of the number k of mixture components coherently with the main literature we suggest to use AIC and BIC indices note that AIC tends to overestimate the true number of components we select k as an odd number in this way there is one mixture component, the [( k + 1 ) / 2 ] -th, which corresponds to the centre of the distribution and its mean directly corresponds to the parameter α Bacci, Bartolucci (unipg) MMLV2012 11 / 23

The mixture-based test of symmetry Proposed test of symmetry Proposed test of symmetry in a symmetric density the components specular with respect to the centre of symmetry are represented in equal proportions, whereas in a skewed density they are mixed in different proportions therefore, if the sample observations come from a symmetric distribution, then the weights of mixture components equidistant from the centre of symmetry are equal, being different otherwise the hypothesis of symmetry may be formulated as H 0 : π j = π k − j + 1 , j = 1 , . . . , [ k / 2 ] , where [ z ] is the largest integer less or equal than z and k is fixed Bacci, Bartolucci (unipg) MMLV2012 12 / 23

The mixture-based test of symmetry Proposed test of symmetry the NM k model with constrained π j (i.e., under H 0 ) is nested in the NM k model with unconstrained π j for testing symmetry we may use a likelihood ratio test, based on the deviance LR = 2 [ ℓ (ˆ θ ) − ℓ (ˆ θ 0 )] ˆ θ is the unconstrained maximum likelihood estimator of θ ˆ θ 0 is the maximum likelihood estimator under the constraint H 0 under H 0 , LR is asymptotically distributed as a Chi-square with a number of degrees of freedom equal to [ k / 2 ] (the number of constrained weights) when k = 1 the NM degenerates to a single normal distribution and, therefore, the null hypothesis of symmetry results automatically accepted k depends both on the number of groups characterising the population and on the level of skewness: therefore, there is not a one-to-one correspondence between the mixture components and the groups Bacci, Bartolucci (unipg) MMLV2012 13 / 23

Monte Carlo study Monte Carlo study We compare the NM-based test with k selected through AIC the NM-based test with k selected through BIC traditional test of Gupta (1967) 1000 samples with a given size n and coming from a given density f ( x ) n = 20 , 50 , 100 f ( x ) : N ( 0 , 1 ) , t 5 , Laplace ( Lap ), symmetric NM 3 , χ 2 1 , χ 2 5 , χ 2 10 , standard log-normal ( logN ) nominal level α = 0 . 01 , 0 . 05 , 0 . 10 all analyses are implemented in R software Bacci, Bartolucci (unipg) MMLV2012 14 / 23

Monte Carlo study Main results Empirical significance levels from symmetric distributions NM 3 N ( 0 , 1 ) n t 5 Lap α = 0 . 05 Mixture test (AIC) 20 0.059 0.061 0.069 0.093 50 0.069 0.076 0.075 0.079 100 0.078 0.083 0.096 0.060 Mixture test (BIC) 20 0.019 0.012 0.030 0.062 50 0.010 0.014 0.031 0.058 100 0.005 0.027 0.047 0.048 Gupta’s Test 20 0.038 0.030 0.044 0.037 50 0.038 0.029 0.035 0.045 100 0.043 0.032 0.037 0.045 the mixture-based test shows a performance very similar to that of Gupta’s test when the number k of components is selected by means of BIC when AIC is used for the model selection, an empirical level is observed constantly higher than the nominal one (the type-I error is committed too often) Bacci, Bartolucci (unipg) MMLV2012 15 / 23

Mixtures of equispaced Normal distributions and their use for - PowerPoint PPT Presentation

Mixtures of equispaced Normal distributions and their use for testing symmetry in univariate data Silvia Bacci 1 , Francesco Bartolucci Dipartimento di Economia, Finanza e Statistica - Universit di Perugia University of Naples

ACMS 20340 Statistics for Life Sciences Chapter 11: The Normal Distributions Introducing the

Chapter 5 Slide 1 Normal Probability Distributions 5-1 Overview 5-2 The Standard Normal

Linear regression How to measure the accuracy of linear regression models Linear Regression

Formal Modeling in Cognitive Science 1 Distributions Lecture 20: Joint, Marginal, and Conditional

Analysis of a model of elastic plastic mixtures (Prandtl-Reuss-mixtures) Project of Josef

Unit 2: Probability and distributions 3. Normal and binomial distributions GOVT 3990 - Spring

Applications of linear barycentric rational interpolation at equispaced nodes Jean-Paul Berrut

? ? ? ? Basic Charts Outline - Distributions & Histograms - Mean, Mode, Average - Chart

Modeling end-to-end internet delays using mixtures of Weibull distributions Iain W. Phillips and

Checking Assumptions Normal distributions: use probability plot (or quantile-quantile plot);

Normal A Spectrum of Engineering Design Normal Radical A Spectrum of Engineering Design Normal

Lecture 5: Probability Distributions Random Variables Probability Distributions

Release granular mushrooms Release granular mushrooms and dried mixtures and dried mixtures

The science of mixtures and separation techniques Rahul Bhambure PhD Scientist, Chemical

Mixtures of models Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory

Normal Distributions MATH 107: Finite Mathematics University of Louisville April 2, 2014 Normal

Learning in the Time of Coronavirus: District Leaders on Managing Change, Maintaining

Sustaining Open Access Journals Lindsay Whaley Dartmouth

Christianity in Northumbria ENG240Y Old English / Fri 04 Feb 2011 The Kingdom of Northumbria

Lefferts Boulevard 2012 Traffic Calming Community Board 10 October 4, 2012 Commissioner

JBHS 2020 Senior Parent islands and good hiding places. Information Jack Britt Administrative

1 : 1 million scale mapping of India and the International Map of the World in the early 20 th

Side-Channel Countermeasures Dissection and the Limits of Closed Source Security Evaluations

Ministerio del Interior Ministerio del Interior Ministerio del Interior Ministerio del Interior

Sambuz

Useful Links

Newsletter

Mail Us

Mixtures of equispaced Normal distributions and their use for - PowerPoint PPT Presentation

Mixtures of equispaced Normal distributions and their use for testing symmetry in univariate data Silvia Bacci 1 , Francesco Bartolucci Dipartimento di Economia, Finanza e Statistica - Universit di Perugia University of Naples

ACMS 20340 Statistics for Life Sciences Chapter 11: The Normal Distributions Introducing the

Chapter 5 Slide 1 Normal Probability Distributions 5-1 Overview 5-2 The Standard Normal

Linear regression How to measure the accuracy of linear regression models Linear Regression

Formal Modeling in Cognitive Science 1 Distributions Lecture 20: Joint, Marginal, and Conditional

Analysis of a model of elastic plastic mixtures (Prandtl-Reuss-mixtures) Project of Josef

Unit 2: Probability and distributions 3. Normal and binomial distributions GOVT 3990 - Spring

Applications of linear barycentric rational interpolation at equispaced nodes Jean-Paul Berrut

? ? ? ? Basic Charts Outline - Distributions &amp; Histograms - Mean, Mode, Average - Chart

Modeling end-to-end internet delays using mixtures of Weibull distributions Iain W. Phillips and

Checking Assumptions Normal distributions: use probability plot (or quantile-quantile plot);

Normal A Spectrum of Engineering Design Normal Radical A Spectrum of Engineering Design Normal

Lecture 5: Probability Distributions Random Variables Probability Distributions

Release granular mushrooms Release granular mushrooms and dried mixtures and dried mixtures

The science of mixtures and separation techniques Rahul Bhambure PhD Scientist, Chemical

Mixtures of models Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory

Normal Distributions MATH 107: Finite Mathematics University of Louisville April 2, 2014 Normal

Learning in the Time of Coronavirus: District Leaders on Managing Change, Maintaining

Sustaining Open Access Journals Lindsay Whaley Dartmouth

Christianity in Northumbria ENG240Y Old English / Fri 04 Feb 2011 The Kingdom of Northumbria

Lefferts Boulevard 2012 Traffic Calming Community Board 10 October 4, 2012 Commissioner

JBHS 2020 Senior Parent islands and good hiding places. Information Jack Britt Administrative

1 : 1 million scale mapping of India and the International Map of the World in the early 20 th

Side-Channel Countermeasures Dissection and the Limits of Closed Source Security Evaluations

Ministerio del Interior Ministerio del Interior Ministerio del Interior Ministerio del Interior

Sambuz

Useful Links

Newsletter

Mail Us

? ? ? ? Basic Charts Outline - Distributions & Histograms - Mean, Mode, Average - Chart