Introduction General null hypothesis H 0 : data follow a specific - - PowerPoint PPT Presentation

introduction
SMART_READER_LITE
LIVE PREVIEW

Introduction General null hypothesis H 0 : data follow a specific - - PowerPoint PPT Presentation

2nd International Electronic Conference on Entropy and Its Applications S HANNONS E NTROPY USAGE AS S TATISTIC IN A SSESSMENT OF D ISTRIBUTION Lorentz Jntschi Sorana D. Bolboac Technical University of Cluj-Napoca, Romania Iuliu Haieganu


slide-1
SLIDE 1

2nd International Electronic Conference on Entropy and Its Applications

SHANNON’S ENTROPY USAGE

AS STATISTIC IN ASSESSMENT OF DISTRIBUTION

Lorentz Jäntschi Sorana D. Bolboacă

Technical University of Cluj-Napoca, Romania Iuliu Haţieganu University of Medicine and Pharmacy, Romania

slide-2
SLIDE 2

2nd International Electronic Conference on Entropy and Its Applications

Introduction

  • General null hypothesis H0: data follow a specific

distribution

  • Kolmogorov-Smirnov (KS)
  • Anderson-Darling (AD)
  • Cramér-von-Mises (CM)
  • Kuiper V (KV)
  • Watson U2 (WU)
  • Shannon’s H1 – introduced here as statistic

2

slide-3
SLIDE 3

2nd International Electronic Conference on Entropy and Its Applications

Computing of statistics

 

       

1 n i i i

)) f 1 ( f ln( ) 1 i 2 ( n 1 n AD

) f n i , n 1 i f ( max n KS

i i n i

    

 

)) f n i ( max ) n 1 i f ( max ( n KV

i n i i n i

     

   

 

   

1 n i 2 i)

f n 2 1 i 2 ( n 12 1 CM

2 1 n i i 1 n i 2 i

) f n 1 2 1 ( n ) f n 2 1 i 2 ( n 12 1 WU

 

   

     

 

      

1 n i i i i i

) f 1 ln( ) f 1 ( ) f ln( f 1 H

where

  • n: sample size
  • fi: cumulative distribution

function (of the distribution being tested) associated with the ith (from 0 to n-1)

  • bservation sorted in

ascending order

3

slide-4
SLIDE 4

2nd International Electronic Conference on Entropy and Its Applications

Monte-Carlo building of statistic-probability map

For 0 ≤ k ≤ 1000·K n i for , Random f

] 1 , [ Uniform i

   ) ) f (( Sort ) f (

n i i ASC n i i    

 ) ) f (( Formula Observed

n i i k  

 EndFor ) ) Observed (( Sort ) Observed (

K k k ASC K k k    

 For 1 ≤ j ≤ 999 ) Observed , Observed ( Mean Statistic

j K 1000 1 j K 1000 1000 / j     

 EndFor

The formula of each statistic enters here

4

slide-5
SLIDE 5

2nd International Electronic Conference on Entropy and Its Applications

Material: 50 samples of properties and activities measurements of chemicals

Probability Density Function

n

7.2 6.8 6.4 6 5.6 5.2 4.8 4.4 4 3.6 3.2 2.8 0.44 0.4 0.36 0.32 0.28 0.24 0.2 0.16 0.12 0.08 0.04

Experimental values taken from literature Sample of samples sizes are log-normal distributed ranging from n = 13 to n = 1714

5

For each sample, the agreement between each out

  • f four

distributions and the observations were assessed with each statistic

slide-6
SLIDE 6

2nd International Electronic Conference on Entropy and Its Applications Log-Normal                        2 x ln exp 2 x 1 ) , ; x ( LN

2

Normal                        2 x exp 2 1 ) , ; x ( N

2

Gauss-Laplace                                  

2 / q q 2 / 3 2 / 1

) q / 3 ( ) q / 1 ( x exp ) q / 1 ( ) q / 3 ( 2 p ) q , , ; x ( GL Fisher-Tippett                                   

    / 1 / 1 1

x 1 exp x 1 1 ) q , , ; x ( FT 6

Common distributions were included into analysis. Two of them are two- parametrical (LN and N) and two are three-parametrical

slide-7
SLIDE 7

2nd International Electronic Conference on Entropy and Its Applications

Combining multiple tests

  • Pearson-Fisher Chi-Square

Fisher RA, 1948. "Questions and answers #14". The American Statistician 2(5):30-31

7

slide-8
SLIDE 8

2nd International Electronic Conference on Entropy and Its Applications

RESULTS

Plots of statistic-probability maps

8

slide-9
SLIDE 9

2nd International Electronic Conference on Entropy and Its Applications

  • 2.0
  • 1.5
  • 1.0
  • 0.5

0.0 0.5 1.0 1.5 2.0 2.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 CM AD WU KS KV H1

Statistics in log scale for n = 25

9

} 1 H , KV , KS , WU , AD , CM { S 999 . ,..., 002 . , 001 . p for ), S / S ln(

500 . p

 

Statistics represented in logarithmic scale reveals that the highest resolution changing its value is for CM, while H1 tends to vary slowest

slide-10
SLIDE 10

2nd International Electronic Conference on Entropy and Its Applications

Statistics in relative scale for n = 25

10

  • 1.0
  • 0.8
  • 0.6
  • 0.4
  • 0.2

0.0 0.2 0.4 0.6 0.8 1.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 CM AD WU KS KV H1

Statistics represented in relative scale reveals on one hand moving of the inflexion point from near p = 0 for CM, AD and WU to the middle (p=0.5) for KS, KV and H1 and convergence (when n increases) to symmetrical shape for H1 o the other hand

} 1 H , KV , KS , WU , AD , CM { S 999 . ,..., 002 . , 001 . p for , 1 S S S 2

999 . 001 . p

    

slide-11
SLIDE 11

2nd International Electronic Conference on Entropy and Its Applications

RESULTS

Scenario 1: combining probabilities from AD, KS, CM, KV and WU Scenario 2: combining probabilities from AD, KS, CM, KV, WU and H1

11

No single statistic should be used as ‘absolute reference’ when agreement between observation and model is assessed; the combined probability from the whole pool of statistics should be used instead

slide-12
SLIDE 12

2nd International Electronic Conference on Entropy and Its Applications

Individual rejections at 5% risk being in error for each statistic

Distribution AD KS CM KV WU H1 Gauss-Laplace 9 12 11 19 17 Fisher-Tippett 6 5 4 13 11 3 Lognormal 4 7 4 18 16 3 Normal 8 14 10 21 20

50 samples were analyzed H0 (data follows a certain distribution) were rejected at 5% risk being in error differently by each statistic

12

Here is a simple proof why a single statistic should not be used for measuring the agreement between observation and the model

slide-13
SLIDE 13

2nd International Electronic Conference on Entropy and Its Applications

Combined rejections at 5% risk being in error for each scenario

Distribution Scenario 1 Scenario 2 Gauss-Laplace 19 19 Fisher-Tippett 13 13 Lognormal 20 18 Normal 21 21

50 samples were analyzed H0 (data follows a certain distribution) were rejected at 5% risk being in error differently by each scenario of combining statistics

13

slide-14
SLIDE 14

2nd International Electronic Conference on Entropy and Its Applications

DISCUSSION

By taking the 5% risk being in error as the threshold of rejecting H0 (data follows a certain distribution)

  • The scenario (1) not including H1 have the tendency to

reject the H0 more often than any single statistic

  • The scenario (2) including H1 have the tendency to

reject the H0 closely to the highest rejection rate of any single statistic (was obtained the same rejection rate as KV)

14

slide-15
SLIDE 15

2nd International Electronic Conference on Entropy and Its Applications

CONCLUSIONS

  • Shannon’s statistic seems to have the tendency to fail

to reject H0 more often than all another investigated statistics

  • However, its use in a battery of statistics in testing the

H0, it changes the outcome not significantly (2 out of 73 less rejections of H0) and making it more closely to the maximum rejection rate of a single statistic

15

slide-16
SLIDE 16

2nd International Electronic Conference on Entropy and Its Applications 16

THANK YOU !

For any query, please do not hesitate to contact us: lorentz.jantschi@gmail.com & sbolboaca@umfcluj.ro

Cluj-Napoca (Romania) by night