 
              2nd International Electronic Conference on Entropy and Its Applications S HANNON’S E NTROPY USAGE AS S TATISTIC IN A SSESSMENT OF D ISTRIBUTION Lorentz Jäntschi Sorana D. Bolboacă Technical University of Cluj-Napoca, Romania Iuliu Haţieganu University of Medicine and Pharmacy, Romania
2 2nd International Electronic Conference on Entropy and Its Applications Introduction • General null hypothesis H 0 : data follow a specific distribution • Kolmogorov-Smirnov (KS) • Anderson-Darling (AD) • Cramér -von-Mises (CM) • Kuiper V (KV) • Watson U 2 (WU) • Shannon’s H1 – introduced here as statistic
3 2nd International Electronic Conference on Entropy and Its Applications Computing of statistics  n 1 1          where AD n ( 2 i 1 ) ln( f ( 1 f )) i i n • n: sample size  i 0 • f i : cumulative distribution  i 1 i     KS n max ( f , f ) function (of the distribution i i   n n 0 i n being tested) associated  with the i th (from 0 to n-1) i 1 i      KV n ( max ( f ) max ( f )) observation sorted in i i     n n 0 i n 0 i n ascending order   n 1 1 2 i 1     2 CM ( f i ) 12 n 2 n  i 0    n 1 n 1 1 2 i 1 1 1        2 2 WU ( f ) n ( f ) i i 12 n 2 n 2 n   i 0 i 0  n 1         H 1 f ln( f ) ( 1 f ) ln( 1 f ) i i i i  i 0
4 2nd International Electronic Conference on Entropy and Its Applications Monte-Carlo building of statistic-probability map For 0 ≤ k ≤ 1000·K    f Random , for 0 i n i Uniform [ 0 , 1 ] The formula of  ( f ) Sort (( f ) )     each statistic i 0 i n ASC i 0 i n  Observed Formula (( f ) ) enters here   k i 0 i n EndFor  ( Observed ) Sort (( Observed ) )     k 0 k K ASC k 0 k K For 1 ≤ j ≤ 999  Statistic Mean ( Observed , Observed )      j / 1000 1000 K j 1 1000 K j EndFor
5 2nd International Electronic Conference on Entropy and Its Applications Material: 50 samples of properties and activities measurements of chemicals Probability Density Function 0.44 0.4 For each sample, the agreement 0.36 between each out 0.32 of four 0.28 distributions and the observations 0.24 were assessed 0.2 with each statistic 0.16 0.12 0.08 0.04 0 2.8 3.2 3.6 4 4.4 4.8 5.2 5.6 6 6.4 6.8 7.2 n Experimental values taken from literature Sample of samples sizes are log-normal distributed ranging from n = 13 to n = 1714
6 2nd International Electronic Conference on Entropy and Its Applications Common distributions were included into analysis. Two of them are two- parametrical (LN and N) and two are three-parametrical   Log-Normal   2 ln x      1     LN ( x ; , ) exp     2 x 2         Normal   2 x      1     N ( x ; , ) exp     2 2         Gauss-Laplace   q   x     1 / 2 p ( 3 / q )       GL ( x ; , , q ) exp   q / 2 3 / 2    2 ( 1 / q )   ( 1 / q )           ( 3 / q )      Fisher-Tippett   1 /   1 x           FT ( x ; , , q ) exp 1        1 1 /     x        1   
2nd International Electronic Conference on Entropy and Its Applications Combining multiple tests • Pearson-Fisher Chi-Square 7 Fisher RA, 1948. "Questions and answers #14". The American Statistician 2(5):30-31
8 2nd International Electronic Conference on Entropy and Its Applications R ESULTS Plots of statistic-probability maps
9 2nd International Electronic Conference on Entropy and Its Applications Statistics in log scale for n = 25 Statistics represented in logarithmic scale reveals that the highest resolution changing its value is for CM, while H1 tends to vary slowest 2.5 CM AD WU KS KV H1 2.0 1.5 1.0 0.5 0.0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -0.5  -1.0 p 0 . 001 , 0 . 002 ,..., 0 . 999 ln( S / S ), for -1.5  p 0 . 500 S { CM , AD , WU , KS , KV , H 1 } -2.0
10 2nd International Electronic Conference on Entropy and Its Applications Statistics in relative scale for n = 25 Statistics represented in relative scale reveals on one hand moving of the inflexion point from near p = 0 for CM, AD and WU to the middle (p=0.5) for KS, KV and H1 and convergence (when n increases) to symmetrical shape for H1 o the other hand 1.0 CM AD WU KS KV H1 0.8 0.6 0.4 0.2 0.0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -0.2 -0.4 -0.6 -0.8   -1.0 2 S p 0 . 001 , 0 . 002 ,..., 0 . 999  p 1 , for   S S S { CM , AD , WU , KS , KV , H 1 } 0 . 001 0 . 999
11 2nd International Electronic Conference on Entropy and Its Applications R ESULTS Scenario 1: combining probabilities from AD, KS, CM, KV and WU Scenario 2: combining probabilities from AD, KS, CM, KV, WU and H1 No single statistic should be used as ‘absolute reference’ when agreement between observation and model is assessed; the combined probability from the whole pool of statistics should be used instead
12 2nd International Electronic Conference on Entropy and Its Applications Individual rejections at 5% risk being in error for each statistic Here is a simple proof why a single statistic should not be used for measuring the agreement between observation and the model Distribution AD KS CM KV WU H1 Gauss-Laplace 9 12 11 19 17 0 Fisher-Tippett 6 5 4 13 11 3 Lognormal 4 7 4 18 16 3 Normal 8 14 10 21 20 0 50 samples were analyzed H 0 (data follows a certain distribution) were rejected at 5% risk being in error differently by each statistic
13 2nd International Electronic Conference on Entropy and Its Applications Combined rejections at 5% risk being in error for each scenario Distribution Scenario 1 Scenario 2 Gauss-Laplace 19 19 Fisher-Tippett 13 13 Lognormal 20 18 Normal 21 21 50 samples were analyzed H 0 (data follows a certain distribution) were rejected at 5% risk being in error differently by each scenario of combining statistics
14 2nd International Electronic Conference on Entropy and Its Applications D ISCUSSION By taking the 5% risk being in error as the threshold of rejecting H 0 (data follows a certain distribution) • The scenario (1) not including H1 have the tendency to reject the H 0 more often than any single statistic • The scenario (2) including H1 have the tendency to reject the H 0 closely to the highest rejection rate of any single statistic (was obtained the same rejection rate as KV)
15 2nd International Electronic Conference on Entropy and Its Applications C ONCLUSIONS Shannon’s statistic seems to have the tendency to fail • to reject H 0 more often than all another investigated statistics • However, its use in a battery of statistics in testing the H 0 , it changes the outcome not significantly (2 out of 73 less rejections of H 0 ) and making it more closely to the maximum rejection rate of a single statistic
16 2nd International Electronic Conference on Entropy and Its Applications T HANK Y OU ! For any query, please do not hesitate to contact us: lorentz.jantschi@gmail.com & sbolboaca@umfcluj.ro Cluj-Napoca (Romania) by night
Recommend
More recommend