Goodness-of-fit tests based on -entropy differences J-F Bercher 1 , - - PowerPoint PPT Presentation

goodness of fit tests based on entropy differences
SMART_READER_LITE
LIVE PREVIEW

Goodness-of-fit tests based on -entropy differences J-F Bercher 1 , - - PowerPoint PPT Presentation

Goodness-of-fit tests based on -entropy differences J-F Bercher 1 , V. Girardin 2 , J. Lequesne 2 , P. Regnault 3 Laboratoire dinformatique Gaspard-Monge, ESIEE, Marne-la-Valle, FRANCE 2 Laboratoire de Mathmatiques N. Oresme Universit


slide-1
SLIDE 1

Goodness-of-fit tests based on φ-entropy differences

J-F Bercher1, V. Girardin2, J. Lequesne2, P. Regnault3

Laboratoire d’informatique Gaspard-Monge, ESIEE, Marne-la-Vallée, FRANCE

2 Laboratoire de Mathématiques N. Oresme

Université de Caen – Basse Normandie, FRANCE

3 Laboratoire de Mathématiques de Reims

Université de Reims Champagne-Ardenne, FRANCE

MaxEnt 2014 - 25/09/14

  • Ph. Regnault

(LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14 1 / 21

slide-2
SLIDE 2

Introduction Vasicek entropy-based normality test Paradigm of GOF test via entropy differences Maximizing (h, φ)-entropies under moment constraints Maximum φ-entropy distributions Scale-invariant entropies A Pythagorean equality for Bregman divergence Parametric models of MaxEnt distributions... ... for Shannon entropy ... for Tsallis entropy ... for Burg entropy Goodness-of-fit tests for... ... an exponential family ... an q-exponential family

  • Ph. Regnault

(LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14 2 / 21

slide-3
SLIDE 3

Introduction Vasicek entropy-based normality test

Vasicek entropy-based normality test

Vasicek (1976) introduced a goodness-of-fit procedure for testing the normality of uncategorical data based on the maximum entropy property N(m, σ) = Argmax p∈Pm,σS(p), with

◮ N(m, σ), the Gaussian distribution with mean m and variance σ2 ; ◮ S(p) = −

  • p(x) log p(x)dx, Shannon entropy of p ;

◮ Pm,σ the set of p.d.f. with mean m and variance σ2.

  • Ph. Regnault

(LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14 3 / 21

slide-4
SLIDE 4

Introduction Vasicek entropy-based normality test

Vasicek entropy-based normality test

Vasicek (1976) introduced a goodness-of-fit procedure for testing the normality of uncategorical data based on the maximum entropy property N(m, σ) = Argmax p∈Pm,σS(p). Precisely, from an n-sample (X1, . . . , Xn) drawn according to the p.d.f. p with finite variance, for testing H0 : p ∈ M = {N(m, σ), m ∈ R, σ > 0} against H1 : p ∈ M, the Vasicek test statistics is Tn = exp S(p)n − S(N( mn, σn))

  • ,

where S(p)n = 1 n

n

  • i=1

log n 2m(X(i+m) − X(i−m))

  • is a consistent estimator of

S(p).

  • Ph. Regnault

(LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14 3 / 21

slide-5
SLIDE 5

Introduction Paradigm of GOF test via entropy differences

Paradigm of GOF test via entropy differences

The main theoretical ingredients of Vasicek GOF test are :

◮ Maximum entropy property : the pdf in the null-hypothesis model M

maximizes Shannon entropy under moment constraints ;

◮ Pythagorean equality : for any p ∈ Pm,σ, we have

K(p|Nm,σ) = S(Nm,σ) − S(p);

◮ Estimation of Shannon entropy.

  • Ph. Regnault

(LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14 4 / 21

slide-6
SLIDE 6

Introduction Paradigm of GOF test via entropy differences

Paradigm of GOF test via entropy differences

The main theoretical ingredients of Vasicek GOF test are :

◮ Maximum entropy property : the pdf in the null-hypothesis model M

maximizes Shannon entropy under moment constraints ;

◮ Pythagorean equality : for any p ∈ Pm,σ, we have

K(p|Nm,σ) = S(Nm,σ) − S(p);

◮ Estimation of Shannon entropy.

Numerous authors adapted Vasicek’s procedure

◮ to various parametric models of maximum entropy distributions, where

entropy stands for Shannon (overwhelming majority) or Rényi entropies ;

◮ introducing other estimators for the entropy (of both the null-hypothesis

distribution and actual distribution).

  • Ph. Regnault

(LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14 4 / 21

slide-7
SLIDE 7

Introduction Paradigm of GOF test via entropy differences

Extending the theoretical background

Parametric models for which Vasicek’s procedure-type can be developed by means

  • f Shannon entropy maximization are well identified : exponential families ; see

Lequesne’s PhD thesis. We investigate here the (informational geometric) shape and properties of parametric models for which entropy-based GOF tests may be developed, through the generalization of

◮ Maximum entropy property for φ-entropy functionals ; ◮ Pythagorean property for Bregman divergence associated to φ-entropy

functionals ;

◮ φ-entropy estimation procedure adapted to the involved parametric models.

  • Ph. Regnault

(LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14 5 / 21

slide-8
SLIDE 8

Maximizing (h, φ)-entropies under moment constraints Maximum φ-entropy distributions

(h, φ)-entropies

The (h, φ)-entropy of a pdf p with support S is Sh,φ(p) := h(Sφ(p)), where Sφ(p) = −

  • S

φ(p(x))dx, with

◮ φ : R+ → R a twice continuously differentiable convex function ; ◮ h : R → R a real function.

(h, φ)–entropy h(y) φ(x) Shannon y x log x Ferreri y (1 + rx) log(1 + rx)/r Burg y − log x Itakura-Saito y x − log x + 1 Tsallis ±(q − 1)−1(y − 1) ∓xq, q > 0, q = 1 Rényi ±(1 − q)−1 log y ∓xq, q > 0, q = 1 L2-norm y x2 Havrda and Charvat y (1 − 21−r)−1(xr − x) Basu-Harris-Hjort-Jones y 1 − (1 + 1/r)x + x1+r/r

  • Ph. Regnault

(LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14 6 / 21

slide-9
SLIDE 9

Maximizing (h, φ)-entropies under moment constraints Maximum φ-entropy distributions

Maximum φ-entropy distributions

For increasing functions h, maximizing Sh,φ(p) amounts to maximizing Sφ(p), which is solved by :

Theorem – Girardin (1997)

Let M0 = 1, M1, . . . , MJ linearly independent measurable functions defined on an interval S. Let m = (1, m1, . . . , mJ) ∈ RJ+1 and p0 ∈ P(m, M), where P(m, M) = {p : Ep(Mj) = mj, j ∈ {0, . . ., J}} . If there exists (a unique) λ = (λ0, . . . , λJ) ∈ RJ+1 such that φ′(p0) =

J

  • j=0

λjMj, then Sφ(p0) ≥ Sφ(p), p ∈ P(m, M). The converse holds if S is compact.

  • Ph. Regnault

(LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14 7 / 21

slide-10
SLIDE 10

Maximizing (h, φ)-entropies under moment constraints Maximum φ-entropy distributions

Parametric models as families of MaxEnt distributions

Given a parametric family M = {p(.; θ), θ ∈ Θ ⊆ Rd} of distributions supported by S, we look for

◮ φ a convex function from R+ to R, ◮ M1, . . . , MJ measurable functions from S to R with M0 = 1, M1, . . . , MJ

linearly independent, such that for any θ, a unique λ ∈ RJ+1 exists satisfying p(.; θ) = φ′−1  

J

  • j=0

λjMj   . Fortunately, requiring the entropy functionals to satisfy some natural properties allows the search to be drastically restricted.

  • Ph. Regnault

(LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14 8 / 21

slide-11
SLIDE 11

Maximizing (h, φ)-entropies under moment constraints Scale-invariant entropies

Scale-invariant entropies

Definitions

◮ An entropy functional S is said to be scaled-invariant if functions a and b

exist, with a > 0 non-increasing such that S(µpµ) = a(µ)S(p) + b(µ) for all µ ∈ R, where pµ(x) = p(µx).

◮ Two entropy functionals S and

S are said to be equivalent for maximization if S(p) > S(q) iff

  • S(p) >

S(q).

  • Ph. Regnault

(LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14 9 / 21

slide-12
SLIDE 12

Maximizing (h, φ)-entropies under moment constraints Scale-invariant entropies

Scale-invariant entropies

Definitions

◮ An entropy functional S is said to be scaled-invariant if functions a and b

exist, with a > 0 non-increasing such that S(µpµ) = a(µ)S(p) + b(µ) for all µ ∈ R, where pµ(x) = p(µx).

◮ Two entropy functionals S and

S are said to be equivalent for maximization if S(p) > S(q) iff

  • S(p) >

S(q).

Theorem – Kosheleva (1998)

If an entropy functional is scale-invariant, then it is equivalent for maximization to

  • ne of the functionals

  • S

p(x) log p(x)dx, 1 1 − q

  • S

p(x)qdx,

  • S

log p(x).

  • Ph. Regnault

(LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14 9 / 21

slide-13
SLIDE 13

Maximizing (h, φ)-entropies under moment constraints A Pythagorean equality for Bregman divergence

A Pythagorean equality for Bregman divergence

The Bregman divergence (or distance) Dφ(p|q) associated to the φ-entropy of a distribution p with respect to another q with same support S is Dφ(p|q) = Sφ(q) − Sφ(p) −

  • S

φ′(q(x))[p(x) − q(x)]dx. Entropy Associated Bregman divergence Shannon K(p|q) =

  • S

p(x) log p(x) q(x)dx, Tsallis Tq(p|q) = (1 + q)

  • S

q(x)qdx −

  • S
  • qq(x)q−1 + p(x)q−1

p(x)dx, Burg B(p|q) =

  • S

p(x) q(x) − log p(x) q(x) − 1

  • .

Proposition

Let p0 ∈ P(m, M) satisfy φ′(p0) = J

j=0 λjMj for some λ ∈ RJ+1. Then

Dφ(p|p0) = Sφ(p0) − Sφ(p), p ∈ P(m, M).

  • Ph. Regnault

(LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14 10 / 21

slide-14
SLIDE 14

Parametric models of MaxEnt distributions... ... for Shannon entropy

Exponential families maximize Shannon entropy

Given an exponential family M = {p(.; λ), λ ∈ Λ ⊆ RJ+1}, the p.d.f. p(x, λ) = exp  

J

  • j=1

λjMj(x) + λ0   , x ∈ S, maximizes Shannon entropy for the moment constraints M1, . . . , MJ. Note that λ0 = −ψ(λ1, . . . , λJ), with ψ(λ1, . . . , λJ) = log exp( λjMj(x))dx.

Support Parametric model Density ∝ Moment function(s) [0; 1] Beta B(a, b), a, b > 0 xa(1 − x)b M1(x) = log(x) M2(x) = log(1 − x) [0; 1] Alpha A(a, b, c), a, b, c > 0 xa(1 − x)be−cx M1(x) = log(x) M2(x) = log(1 − x) M3(x) = x [0; ∞[ Exponential E(λ), λ > 0 e−λx M1(x) = x [0; ∞[ Gamma G(λ, N), λ, N > 0 xN−1e−λx M1(x) = x M2(x) = log(x) [0; ∞[ Beta prime B′(a, b), a, b > 0 xa−1(1 + x)−a−b M1(x) = log(x) M2(x) = log(1 + x) [0; ∞[ Pareto type I PI(c), c > 0 x−c−1 M1(x) = log(x) [0; ∞[ Planck P L(a, b), a > 0, b > 1 x−be−b/x M1(x) = log(x) M2(x) = 1/x R Normal N (m, σ) e−(x−m)2/2σ2 M1(x) = x M2(x) = x2

  • Ph. Regnault

(LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14 11 / 21

slide-15
SLIDE 15

Parametric models of MaxEnt distributions... ... for Shannon entropy

Dual coordinate systems of exponential families

Any distribution p ∈ M, where M is an exponential family, can be indexed whether by

◮ its moment (or expectation) parameters (m1, . . . , mJ) or ◮ its exponential (or Maximum Entropy – ME) parameters (λ1, . . . , λJ).

Both coordinate systems are linked through the relation S(p) = −

J

  • j=1

λjmj + ψ(λ), p ∈ M. Precisely, for j ∈ {1, . . ., J}, λj = ∂ ∂mj (−S(p(.; m))), mj = ∂ ∂λj ψ(λ).

  • Ph. Regnault

(LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14 12 / 21

slide-16
SLIDE 16

Parametric models of MaxEnt distributions... ... for Tsallis entropy

q-exponential families maximize Tsallis entropy

Given the the q-exponential family M = {p(.; λ), λ ∈ Λ}, with q = 2 − q, the p.d.f given by p(x; λ) =  

J

  • j=1

λjMj(x) + λ0  

1 q−1

∝ exp

q

 

J

  • j=1

λjMj(x)   , x ∈ S, (1) maximizes q-Tsallis entropy for the moment constraints M1, . . . , MJ, where the

  • q-exponential function exp

q is given by exp q(x) = (1 + (1 −

q)x)

1 1− q

+

.

Proposition

In (1), the parameter λ0 can be expressed (locally) as a differentiable function ψ

  • f (λ1, . . . , λJ) such that

∂ ∂λj ψ(λ1, . . . , λJ) =

  • S

Mj(x)Eq(p)(x)dx, j ∈ {1, . . . , J}, where Eq(p) is the q-escort distribution associated to p, given by Eq(p)(x) = p(x)q

  • S p(x)qdx ,

x ∈ S.

  • Ph. Regnault

(LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14 13 / 21

slide-17
SLIDE 17

Parametric models of MaxEnt distributions... ... for Tsallis entropy

Some examples

Support Parametric model Density ∝ q Moment function(s) [0; 1] Sub-Beta S B(u), u ∈ [0, 1] (1 − ux)−2

1 2

M1(x) = x [0; ∞[ Pareto type II PII(a), a > 0 (1 + ax)−c−1

c c+1

M1(x) = x [0; ∞[ Pareto type IV PIV(a), a > 0 (1 + axk)−c−1

c c+1

M1(x) = xk R Student S(µ, σ), µ ∈ R, σ > 0

  • 1 + 1

ν

x−µ

σ

2− ν+1

2 ν−1 ν+1

M1(x) = x M2(x) = x2

Particularly, the non-standard Student distributions with (fixed) degree of freedom ν > 2, location parameter µ ∈ R and scale parameter σ > 0, given by p(x; µ, σ) = 1 √νπσ Γ((ν + 1)/2) Γ(ν/2)

  • 1 + 1

ν x − µ σ 2− ν+1

2

, x ∈ R, maximize Tsallis entropy and hence Rényi entropy with parameter q = (ν − 1)/(ν + 1) for the algebraic moment functions M1(x) = x and M2(x) = x2.

  • Ph. Regnault

(LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14 14 / 21

slide-18
SLIDE 18

Parametric models of MaxEnt distributions... ... for Tsallis entropy

Dual coordinate systems

Any distribution p ∈ M, where M is a (2 − q)-exponential family, can be indexed whether by

◮ its expectation parameters (m1, . . . , mJ) or ◮ its

q-exponential (or ME) parameters (λ1, . . . , λJ). Both coordinate systems are linked through the relation Tq(p) = 1 1 − q   

J

  • j=1

λjmj − ψ(λ) − 1    . Particularly, for j ∈ {1, . . . , J}, λj = ∂ ∂mj (1 − q)Tq(p(.; m))).

  • Ph. Regnault

(LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14 15 / 21

slide-19
SLIDE 19

Parametric models of MaxEnt distributions... ... for Burg entropy

Inverse polynomial families maximize Burg entropy

The densities of Pareto type III distributions with fixed parameter δ > 0 pδ(x; σ) ∝

  • 1 +

x σ δ−1 , x ∈ R+, σ > 0, and Cauchy distributions p(x; µ, σ) ∝

  • 1 +

x − µ σ 2−1 , x ∈ R, µ ∈ R, σ > 0, are natural candidates for maximizing Burg entropy for the moment functions M1(x) = xδ and M1(x) = x, M2(x) = x2 respectively. Unfortunately, these algebraic moments are infinite for these distributions. One may avoid the infiniteness of the moment constraints by truncating the tail

  • f the distributions, thus restricting their support S to a bounded interval – work

in progress.

  • Ph. Regnault

(LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14 16 / 21

slide-20
SLIDE 20

GOF tests for...

GOF tests : the theoretic background

Let M = {p(.; λ), λ ∈ Λ} be a parametric family of MaxEnt distributions for constraints functions M1, . . . , MJ. Let (X1, . . . , Xn) be an n-sample drawn according to a p.d.f. p satisfying Ep(Mj) < ∞, j ∈ {1, . . ., J}. For testing H0 : p ∈ M against H1 : p ∈ M, let the test statistics be

  • Tn := Sφ(p(.;

λn)) − Sφ(p)n, where

λn is the ME estimator (MEE), i.e., the ME parameter corresponding to the empirical moment estimator mn given by m(j)

n = 1 n

n

i=1 Mj(Xi),

j ∈ {1, . . . , J} ;

Sφ(p)n is some non-parametric estimator of the φ-entropy of the sample.

  • Ph. Regnault

(LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14 17 / 21

slide-21
SLIDE 21

GOF tests for... ... an exponential family

For an exponential family M = {p(.; λ) = exp( λjMj − ψ(λ)), λ ∈ Λ},

◮ the MEE

λn equals the (Quasi)-Maximum Likelihood Estimator (QMLE) of λ∗, where λ∗ = Argmin λ∈ΛK(p|p(.; λ)) ;

◮ spacing-based estimator of Shannon entropy S(P) (Tarasenko (1968),

Vasicek (1976)) :

  • Sn,k = 1

n

n

  • i=1

log n k (X(i+k) − X(i−k))

  • ,

where X(1) ≤ · · · ≤ X(n) is the order statistics of the sample and k ∈ {1, . . . , n − 1}.

  • Sn,k is strongly consistent as n, k → ∞, with k/n → 0.

Hence, the GOF test with statistics Tn = S(p(.; λn)) − Sn,k is consistent.

  • Ph. Regnault

(LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14 18 / 21

slide-22
SLIDE 22

GOF tests for... ... an exponential family

For an exponential family M = {p(.; λ) = exp( λjMj − ψ(λ)), λ ∈ Λ},

◮ the MEE

λn equals the (Quasi)-Maximum Likelihood Estimator (QMLE) of λ∗, where λ∗ = Argmin λ∈ΛK(p|p(.; λ)) ;

◮ the k-Nearest Neighbor (kNN) estimator of Shannon entropy S(P) is

  • Sn,k = −1

n

n

  • i=1

log pn,k(Xi), where pn,k(Xi) = exp(ψ(k)) 2(n − 1)ρk(Xi), with ρk(Xi) is the distance from Xi to its k-th closest neighbor in the sample.

  • Sn,k is strongly consistent as n, m → ∞, with m/n → 0.

Hence, the GOF test with statistics Tn = S(p(.; λn)) − Sn,k is consistent.

  • Ph. Regnault

(LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14 18 / 21

slide-23
SLIDE 23

GOF tests for... ... an q-exponential family

For a q-exponential family M =

  • p(.; λ) =

J

j=0 λjMj

1/(q−1) , λ ∈ Λ

  • ,

◮ the MEE

λn is no more equal to the QMLE of λ∗, nor to the (Quasi)-Maximum of log

q-Likelihood Estimator, but can be derived directly

from the relation

  • λ(j)

n =

∂ ∂mj (1 − q)Tq(p(.; m))

  • m=

mn

, where m(j)

n = 1 n

n

i=1 Mj(Xi).

◮ the kNN estimator of Tsallis entropy Tq(P) is

Tn,k =

1 1−q

  • In,k − 1
  • where
  • In,k = 1

n

n

  • i=1
  • Gk

2(n − 1)ρk(Xi) 1−q , with Gk =

  • Γ(k+1−q)

Γ(k)

1/(1−q) . Leonenko et al. (2008) proved that Tn,k is strongly consistent as n, m → ∞, with m/n → 0, under mild assumption on q < 1. Hence, the GOF test with statistics Tn = Tq(p(.; λn)) − Tn,k is consistent.

  • Ph. Regnault

(LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14 19 / 21

slide-24
SLIDE 24

GOF tests for... ... an q-exponential family

Example : GOF test to non-standard Student p.d.f.

Poster Session 3 (this afternoon) :

  • J. Lequesne.

A goodness-of-fit test for Student distributions based on Rényi entropy.

  • Ph. Regnault

(LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14 20 / 21

slide-25
SLIDE 25

Some references

Some references

J-F Bercher (2008) On some entropy functionals derived from Rényi information divergence. Information Sciences, 178(12), 2489-2506.

  • V. Girardin (1997) Methods of Realization of Moment Problems with Entropy Maximization. In Distributions

with given marginals and moment problems, edited by V. Benes and J. Stepan, Kluwer Academic Publishers.

O.M .Kocheleva (1998) Symmetry-group justification of maximum entropy method and generalized maximum entropy methods in image processing. In Maximum Entropy and Bayesian Methods, edited by G.

  • J. Erickson and J. T. Rychert and C. R. Smith, Foundamental Theories of Physics.

Leonenko (2008) A Class of Rényi information estimators for multidimentional densities. The Annals of Statistics, 36(5), 2153-2182.

  • J. Lequesne (2015) Tests statistiques basés sur la théorie de l’information. Applications en biologie et en

démographie. PhD thesis, Université de Caen - Basse Normandie, France.

  • O. Vasicek (1976) A test for normality based on sample entropy. Journal of the Royal Statistical Society.

Series B, 38, 54–59.

  • Ph. Regnault

(LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14 21 / 21