A New Statistical Test for Analyzing Skew Normal Data Hassan - PowerPoint PPT Presentation

A New Statistical Test for Analyzing Skew Normal Data Hassan Elsalloukh, Ph.D. Associate Professor of Statistics Department of Mathematics and Statistics University of Arkansas at Little Rock COMPSTAT2010 August 24, 2010 Paris, France

Overview Motivation Azzalini’s Class of Skew Distributions A New Density Function within Azzalini’s Class of Skew Distributions A Score Test for Detecting Non-Normality within the New Density Function Applications  Volcanoes Height Example  Rainfall Example Summary

Motivation The celebrated Gaussian distribution has been known since at least a century before Gauss (1809) popularized it. It is the most well-known and widely used probability density function and has the form: ( )   2 − − θ , −∞< x <∞ x 1 =   ( ) exp f x   σ πσ 2 2 2   It became more important because of the central limit effect discovered by De Moivre (1733).

This distribution appears in probability process, and in the theories and methods of univariate and multivariate, parametric and non-parametric , frequentist and Bayesian statistics. Yet there have always been doubts and reservations and criticisms about the unqualified use of Normality This reflected in the quote by Geary (1947) “Normality is a myth; there never was, and never will be, a normal distribution”.

The normal distribution is symmetric and not practical for modeling skewed data. During the last decade, there has been a growing interest in the construction of flexible parametric classes of distributions that are asymmetric. Various practical applications require models for data exhibiting a unimodal but skew distributions The skewed and kurtotic distributions are useful for data modeling,

Such distributions are useful for data modeling including environmental and financial data that often do not follow the normal law One can introduce skewness into a symmetric distribution in many ways One generalization of the normal distribution was proposed by O’Hagan and Leonard (1976). This generalization was used for Bayesian analysis of normal means

It was also investigated in detail by Azzalini (1985, 1986), who defined a skew-normal distribution as = φ Φ λ ( ) 2 ( ) ( ) f x x x Runnenburg (1978) devised a different way of introducing skewness into a symmetric distribution. By splicing two half-normal distributions with different scale parameters Mudholkar and Hutson (2000) found that this idea could be re-expressed in terms of an explicit skewness parameter ε .

Mudholkar and Hutson (2000) called their probability density function the Epsilon-Skew-Normal family (ESN) :  ( )   2 − − θ x  ≥ θ   exp , for x   − ε σ  2 2 2(1 )    1 =  ( ) f x πσ ( )   2 2  − − θ x < θ    exp , . for x   + ε σ 2 2 2(1 )     Where the parameters are −∞< θ <∞, σ >0, and −1< ε <1

This density resembles the normal family members in many ways and includes the normal family when ε = 0 . Note that the limiting cases of this density as epsilon goes to + or – 1 are the well-known half normal distributions This family is convenient for Bayesian analysis of normal means

In this research, Azzalini’s new skew normal distribution is modified leading to a new class of asymmetric distributions. A new score test is derived for detecting non-normality within the new class of asymmetric distributions. Then, the new score test is applied on an example of a real data set within the new class of asymmetric distributions to detect non-normality Maximum likelihood estimators are used to fit the data with a skew distribution and compared to studies in which researchers used the normal distribution.

Azzalini’s Class of Skew Distributions Azzalini introduced the skew-normal class of distributions, as a class or family able to reflect varying degrees of skewness One such class of distributions was defined by Azzalini as a skew-normal random variable Z with a skewness parameter λ ; with a density function ( ) ( ) ( ) φ λ = φ Φ λ −∞ < < ∞ ; 2 ( ), Z z z z that is, Z is SN( λ ) with −∞< Z <∞, where φ and Φ are the standard normal density and distribution functions, respectively

One limitation of SN( λ ) family is that the parameter λ can produce only tails thinner than the normal distribution. However, we are often interested in analyzing data from heavy-tailed distributions. Azzalini suggested a class of densities, which includes the normal family and allows thick tails, that is,   ω   y ( ) ω = − −∞ < < ∞   ( ; ) exp , g y C y ω ω    

where ω is a positive tail weight parameter and − 1   1 ( ) = ω Γ ω ω −   2 1 1/ C ω   The density g(y,2) is the N(0,1) and g(y,1) is the Laplace density. As ω  ∞ , g(y, ω ) converges to the uniform density on (-1,1) Azzalini introduces skewness in g(y, ω ) in the form of λ ω 2 ( ) ( ; ) G y g y ω ψ = Where 2

The choice of G is the distribution function of 1 ( ) ψ U ψ sgn U where U ~ N (0,1). Therefore, the density that was considered is     ψ ψ 2 λ     y y = − Φ λ     ( ) 2 exp sgn( ) h y C y ψ ψ 2 ψ 2        

Many choices of G and g(y, ω ) are possible. The choices that are considered in this paper are modified to produce a new density function of the form = λ α ( ) 2 ( ) ( ; ), h y G u g u i i i where   − µ 2 y = − − α α σ α 1   = ( | ) ( ) exp ( ) + α , g u w c u 1 i , u i i   σ i − 1 1 1 3 − + α + α −  + α   + α          3(1 ) (1 ) + α + α 3(1 ) (1 ) 2 2 1 1 α = Γ Γ α = + α Γ Γ 1     ( ) , ( ) (1 ) , c w                     2 2 2 2   1 and for λ ≥ 0, Note that when λ α = Φ λ + α λ ( | ) sgn( ) 1 + α , G u u u  1  i  i i  λ = 0 and α = 0, h(y) reduces to a standard normal.

A Score Test for Detecting Non- Normality within the New Density Function The problem of testing hypotheses of univariate normality of a set of observations has been of interest to experimenters for many years As a result, many test statistics have been suggested as possible solutions to the testing-normality problem. One such is the score test or Lagrange multiplier test A score test of normality within the family of new skew distributions are developed now.

Since the score test testing procedure requires estimation only under the null hypothesis, an asymptotically unbiased test of the normality assumption H 0 : λ = 0 and α = 0 vs. H A : λ ≠0 and α ≠0 can be easily constructed. Let y 1 , …, y n be random variables from a new skew distribution then the test statistic is

− 1   ∂ ϕ 2 ∂ ϕ ∂ ϕ     ∂ ϕ ˆ ˆ ˆ ( ) ( ) ( )   L L L ˆ ( ) L   E E       ∂ α ∂ α ∂ λ ∂ ϕ ∂ ϕ       ∂ α ˆ ˆ   ( ) ( ) L L Λ =      ∂ α ∂ λ ∂ ϕ   ˆ 2 ( ) ∂ ϕ ∂ ϕ ∂ ϕ   L     ˆ ˆ ˆ ( ) ( ) ( ) L L L      E E       ∂ λ  ∂ α ∂ λ ∂ λ       2 2     n n n n ∑ ∑ ∑ − + 2 2 ˆ ˆ ˆ ˆ .8648186 ln u u u u     i i i i ˆ ˆ ,     = ξ + ξ 2 = = = + = 1 1 1 i i i 1 2 .2011014 n n N µ σ where . Note that as n  ∞ , the asymptotic ~ ( , ) u i distribution of Λ is chi-square with two degrees of freedom,

Thus, the null hypothesis is rejected if Λ < χ Λ > χ 2 2 or − α α (2,1 /2) (2, /2) The first part of the test statistic, ξ 1 , measures kurtosis and the second part, ξ 2 , measures the skewness of the distribution of interest. We now present two examples

Application Example 1 The score test computations are used on the heights of 219 of the world’s volcanoes (Source: National Geographic Society and the World Alamac 1966, pp. 282- 283) Figure 1 shows an exploratory data analysis in the form of a stem-and-leaf plot. The basic descriptive statistics for the volcano heights Y are: the sample mean Ῡ = 70.246, the standard deviation S= 43.018, the median= 65.000, and the coefficient of skewness b 1 = 0.840. This coefficient indicates that Y is asymmetric.

A New Statistical Test for Analyzing Skew Normal Data Hassan - PowerPoint PPT Presentation

A New Statistical Test for Analyzing Skew Normal Data Hassan Elsalloukh, Ph.D. Associate Professor of Statistics Department of Mathematics and Statistics University of Arkansas at Little Rock COMPSTAT2010 August 24, 2010 Paris, France

Probability BIO5312 FALL2017 STEPHANIE J. SPIELMAN, PHD Skew Symmetric Left-skew Right-skew

On Skew-Homomorphisms B. Kuzma 1 G. Dolinar G. Nagy P . Szokol 1 UP FAMNIT May 28, 2015

Heavy tails: right skew ! Right skew ! normal distribution (not heavy tailed) ! e.g. heights of

Linear regression How to measure the accuracy of linear regression models Linear Regression

Time skew analysis using web cookies Bj orgvin Ragnarsson 07-03-2013 Time skew analysis using

Hook formulas for skew shapes Greta Panova (University of Pennsylvania) joint with Alejandro

M obius disjointness for skew products on T \ G Jianya LIU Shandong University Cetraro

Braided skew monoidal categories Stephen Lack Macquarie University joint work with John Bourke

Higher product levels of skew fields J. Cimpri c July 1, 2004 1 product levels levels of

Model-Based Testing (ISTQB Chapter 4) Arie van Deursen 1 4.1 ISTQB Test Design Test Scripts

Normal A Spectrum of Engineering Design Normal Radical A Spectrum of Engineering Design Normal

Data Science in the Wild Lecture 7: Analyzing Experiments Eran Toch Data Science in the Wild,

Twitter Networks Alex Hanna Computational Social Scientist DataCamp Analyzing Social Media Data

What are survey weights? Kelly McConville Assistant Professor of Statistics DataCamp Analyzing

Understanding Census geography and tigris basics Kyle Walker Instructor DataCamp Analyzing US

Retroactively estimating system clock skew from stored web browser cookies Contents 1. Why? 2.

Hands on Demos for Gaussian Process using R Software Tak (Hyungsuk) Tak & David Jones SAMSI

The Normal Distribution INFO-1301, Quantitative Reasoning 1 University of Colorado Boulder March

Ellipse and Gaussian Distribution Prof. Seungchul Lee Industrial AI Lab. Coordinates 2

Gaussian, Markov and stationary processes Gonzalo Mateos Dept. of ECE and Goergen Institute for

An introduction to Gaussian processes Oliver Stegle and Karsten Borgwardt Machine Learning and

Gaussian Random Variables and Processes Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department

Parameter estimation (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University January 24, 2019

CS480/680 Lecture 12: June 17, 2019 Gaussian Processes [B] Section 6.4 [M] Chap. 15 [HTF] Sec.

Sambuz

Useful Links

Newsletter

Mail Us

A New Statistical Test for Analyzing Skew Normal Data Hassan - PowerPoint PPT Presentation

A New Statistical Test for Analyzing Skew Normal Data Hassan Elsalloukh, Ph.D. Associate Professor of Statistics Department of Mathematics and Statistics University of Arkansas at Little Rock COMPSTAT2010 August 24, 2010 Paris, France

Probability BIO5312 FALL2017 STEPHANIE J. SPIELMAN, PHD Skew Symmetric Left-skew Right-skew

On Skew-Homomorphisms B. Kuzma 1 G. Dolinar G. Nagy P . Szokol 1 UP FAMNIT May 28, 2015

Heavy tails: right skew ! Right skew ! normal distribution (not heavy tailed) ! e.g. heights of

Linear regression How to measure the accuracy of linear regression models Linear Regression

Time skew analysis using web cookies Bj orgvin Ragnarsson 07-03-2013 Time skew analysis using

Hook formulas for skew shapes Greta Panova (University of Pennsylvania) joint with Alejandro

M obius disjointness for skew products on T \ G Jianya LIU Shandong University Cetraro

Braided skew monoidal categories Stephen Lack Macquarie University joint work with John Bourke

Higher product levels of skew fields J. Cimpri c July 1, 2004 1 product levels levels of

Model-Based Testing (ISTQB Chapter 4) Arie van Deursen 1 4.1 ISTQB Test Design Test Scripts

Normal A Spectrum of Engineering Design Normal Radical A Spectrum of Engineering Design Normal

Data Science in the Wild Lecture 7: Analyzing Experiments Eran Toch Data Science in the Wild,

Twitter Networks Alex Hanna Computational Social Scientist DataCamp Analyzing Social Media Data

What are survey weights? Kelly McConville Assistant Professor of Statistics DataCamp Analyzing

Understanding Census geography and tigris basics Kyle Walker Instructor DataCamp Analyzing US

Retroactively estimating system clock skew from stored web browser cookies Contents 1. Why? 2.

Hands on Demos for Gaussian Process using R Software Tak (Hyungsuk) Tak &amp; David Jones SAMSI

The Normal Distribution INFO-1301, Quantitative Reasoning 1 University of Colorado Boulder March

Ellipse and Gaussian Distribution Prof. Seungchul Lee Industrial AI Lab. Coordinates 2

Gaussian, Markov and stationary processes Gonzalo Mateos Dept. of ECE and Goergen Institute for

An introduction to Gaussian processes Oliver Stegle and Karsten Borgwardt Machine Learning and

Gaussian Random Variables and Processes Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department

Parameter estimation (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University January 24, 2019

CS480/680 Lecture 12: June 17, 2019 Gaussian Processes [B] Section 6.4 [M] Chap. 15 [HTF] Sec.

Sambuz

Useful Links

Newsletter

Mail Us

Hands on Demos for Gaussian Process using R Software Tak (Hyungsuk) Tak & David Jones SAMSI