Rank correlation coefficients and their generalizations for interval - - PowerPoint PPT Presentation

rank correlation coefficients and their generalizations
SMART_READER_LITE
LIVE PREVIEW

Rank correlation coefficients and their generalizations for interval - - PowerPoint PPT Presentation

Rank correlation coefficients and their generalizations for interval data Karol Opara and Olgierd Hryniewicz Systems Research Institute Polish Academy of Sciences Konferencja Statystyka Matematyczna Bdlewo 28 October 2016


slide-1
SLIDE 1

Rank correlation coefficients and their generalizations for interval data

Karol Opara and Olgierd Hryniewicz Systems Research Institute Polish Academy of Sciences Konferencja „Statystyka Matematyczna” Będlewo 28 October 2016

slide-2
SLIDE 2

Introduction

Measuring of dependence

Quantifying and testing dependence is one of the major tasks of statistics

Imprecise data

How to compute correlation coefficients for imprecise data? Opara K. and Hryniewicz O. (2016) Computation of general correlation coefficients for interval data International Journal of Approximate Reasoning 73 pp. 56–75.

Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 2 / 46

slide-3
SLIDE 3

Crisp correlation coefficients

slide-4
SLIDE 4

General (crisp) correlation coefficient

We have a set of n objects characterized by two properties x and y To any pair of individuals, say i-th and j-th, one can assign x-score aij = −aji and y-score bij = −bji Kendall (1955) describes a general correlation coefficient Γ as Γ = n

i,j=1 aijbij

n

i,j=1 a2 ij

n

i,j=1 b2 ij

(1) Scores aij and bij are regarded zero if i = j

Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 4 / 46

slide-5
SLIDE 5

Pearson’s r

General correlation coefficient Γ = n

i,j=1 aijbij

n

i,j=1 a2 ij

n

i,j=1 b2 ij

(2) Pearson’s r is based on variate values aij = xj − xi (3) bij = yj − yi (4)

Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 5 / 46

slide-6
SLIDE 6

Rank coefficients ρ and τ

General correlation coefficient Γ = n

i,j=1 aijbij

n

i,j=1 a2 ij

n

i,j=1 b2 ij

(5) Spearman’s ρ is based on ranks aij = pj − pi (6) bij = qj − qi (7) Kendall’s τ is based on ±1 scores aij =

  • +1

if pi < pj −1 if pi > pj (8) bij =

  • +1

if qi < qj −1 if qi > qj (9) Variants τa and τb differently resolve ties in data, τa ≤ τb

Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 6 / 46

slide-7
SLIDE 7

Relations between rank correlation coefficients

Daniels’ inequality −1 ≤ 3τ − 2ρ ≤ 1 (10) refined by Durbin and Stuart 3 2τ − 1 2 ≤ ρ ≤ 1 2 + τ + 1 2τ 2 for τ > 0 (11) 1 2τ 2 + τ − 1 2 ≤ ρ ≤ 3 2τ + 1 2 for τ < 0 (12) Fredricks and Nelsen (2007) proved relation that for a limiting case of dependence weakening towards independence 2ρ → 3τ (13)

Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 7 / 46

slide-8
SLIDE 8

Possible values of ρ and τ (Nelsen, 1991)

Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 8 / 46

slide-9
SLIDE 9

Relation between Pearson’s r and Kendall’s τ

For elliptical distributions (e.g. bivariate normal) (Frahm et al., 2003) τ = 2 π arc sin(r) (14) Typically (Hauke and Kossowski, 2011): large r ⇒ large τ and ρ (15) small r ⇒ small τ and ρ (16)

Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 9 / 46

slide-10
SLIDE 10

Computational formulas for crisp Kendall’s τ

Counting concordant and discordant pairs C = cardi=j{(xi − xj) · (yi − yj) > 0} (17) D = cardi=j{(xi − xj) · (yi − yj) < 0} (18) τ = C − D n · (n − 1) (19) Denœux et al. (2005) imposed linear orders LX and LY on each variate and counted the number of pairs ordered the same way by both of them τ = τ(LX, LY ) = 4 card{LX ∩ LY } n(n − 1) − 1 (20)

Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 10 / 46

slide-11
SLIDE 11

Interpretations

Kendall’s τ – a function of minimal number of transpositions required to order a sample Spearman’s ρ – interpretation in terms of concordances exists requiring a bivariate sample and a pair of independent random variables with the same marginals as the initial ones (Kendall, 1955; Nelsen, 1991)

Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 11 / 46

slide-12
SLIDE 12

Copulas

Sklar’s theorem

Let F(x) and G(y) be continuous CDFs and H(x, y) be two-dimensional CDF of a random variable with marginals F and G. There exists a unique function C, called copula, such that H(x, y) = C (F(x), G(y)) Copulas are invariant against order-preserving transformations such as ranking X → F(X) Rank-based measures of association are properties of copulas

Spearman’s ρ has geometric interpretation in terms of copulas as a scaled proportion of volume of the [0, 1]3 cube under the copula surface. Nelsen (1992) gives a similar interpretation for Kendall’s τ.

Gaussian copula, parameter ρG equals Pearson’s r for normal marginals C(u1, u2; ρ) = ΦN(Φ−1(u1), Φ−1(u2); ρG) (21)

Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 12 / 46

slide-13
SLIDE 13

Copulas and correlation

Population version of Kendall’s τ and Spearman’s ρ can be defined in terms of copulas (Pearson’s r cannot be) τ(X, Y ) = 4E(C(F(X), G(Y )) − 1 (22) Genest and McKay (1986) used CDF K(t) of a random variable T = C(U1, U2), where U1 and U2 are random variables uniformly distributed on [0, 1] to show that τ = 3 − 4 1 K(t)dt (23) Kendall’s τ as the difference between the probabilities of concordance and discordance (for population version of the statistic) τ = P ((x1 − x2)(y1 − y2) > 0) − P ((x1 − x2)(y1 − y2) < 0) (24)

Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 13 / 46

slide-14
SLIDE 14

Interval correlation coefficients

slide-15
SLIDE 15

Interval correlation coefficients

For interval data z ∈ [zL, zU], y ∈ [yL, yU] one obtains interval correlation coefficients [τL, τU] τL([zL, zU], [yL, yU]) = arg min

z∈[zL,zU],y∈[yL,yU]

τ(z, y) τU([zL, zU], [yL, yU]) = arg max

z∈[zL,zU],y∈[yL,yU]

τ(z, y)

Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 15 / 46

slide-16
SLIDE 16

Interval data

Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 16 / 46

slide-17
SLIDE 17

Interval data

Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 16 / 46

slide-18
SLIDE 18

Interval data

Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 16 / 46

slide-19
SLIDE 19

Interval data

Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 16 / 46

slide-20
SLIDE 20

Cross-section (Kendall’s τ)

Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 17 / 46

slide-21
SLIDE 21

−1 1 −1 1 0.5 0.55 0.6 x y Pearson’s r x y −1 −0.5 0.5 1 −1 −0.5 0.5 1 −1 1 −1 1 0.4 0.5 0.6 x y Spearman’s ρ x y −1 −0.5 0.5 1 −1 −0.5 0.5 1 −1 1 −1 1 0.32 0.34 0.36 0.38 0.4 x y Kendall’s τ x y −1 −0.5 0.5 1 −1 −0.5 0.5 1

slide-22
SLIDE 22

Computational formulas for Pearson’s r

Mean product of standard scores r = 1 n − 1

n

  • i=1

xi − ¯ x sx yi − ¯ y sy

  • (25)

After studentization simplifies to a quadratic form r = 1 n − 1

n

  • i=1

x′

i y′ i

(26)

  • r equivalently

r = 1 2 1 n − 1[x′

1, ..., x′ n, y′ 1, ..., y′ n]

0n In In 0n

  • [x′

1, ..., x′ n, y′ 1, ..., y′ n]T

(27) The matrix has eigenvalues ±1

Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 19 / 46

slide-23
SLIDE 23

Computation of interval correlation coefficients

slide-24
SLIDE 24

Inner and outer bounds on correlation coefficients

Computational complexity

For large problems (say n > 8) only approximate solutions are feasible

𝜍𝑀 𝜍𝑉 1 −1 Lower inner bound Upper inner bound Lower outer bound Upper outer bound

Possible values of ranks pi ∈ {pi,L, pi,L + 1, ..., pi,U} (28) qi ∈ {qi,L, qi,L + 1, ..., qi,U} (29)

Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 21 / 46

slide-25
SLIDE 25

Outer bounds for Spearman’s ρ

Crisp Spearman’s ρ can be computed as ρ = 12 n(n2 − 1)

n

  • i=1

piqi − 3(n + 1) n − 1 (30) Products of ranks can be bounded by piqi ≤ pi,Uqi,U (31) piqi ≥ pi,Lqi,L (32)

Interval Spearman’s coefficient can be bounded by

ρ ≤ 12 n(n2 − 1)

n

  • i=1

pUqU − 3(n + 1) n − 1 (33) ρ ≥ 12 n(n2 − 1)

n

  • i=1

pLqL − 3(n + 1) n − 1 (34)

Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 22 / 46

slide-26
SLIDE 26

Outer bounds for Spearman’s ρ

Alternatively, crisp Spearman’s ρ can be computed as ρ = 1 − 6 n

i=1(pi − qi)2

n(n2 − 1) (35) Squared difference of ranks ca be bounded by (pi − qi)2 ≤ max

  • (pi,U − qi,L)2, (qi,U − pi,L)2

(36) (pi − qi)2 ≥ (max {0, pi,L − qi,U, qi,L − pi,U})2 (37)

Another bound is obtained

ρ ≤ 1 − 6 n

i=1 max

  • (pi,U − qi,L)2, (qi,U − pi,L)2

n(n2 − 1) (38) ρ ≥ 1 − 6 n

i=1 (max {0, pi,L − qi,U, qi,L − pi,U})2

n(n2 − 1) (39)

Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 23 / 46

slide-27
SLIDE 27

Reformulation by Genest and Rivest (1993) τ = 4 n(n − 1)

n

  • i=1

wi − 1 (40) wi = card{j ∈ {1, ..., n}|pj < pi, qj < qi} (41) wi ≤ wi,U = card{j ∈ {1, ..., n}|pj,L < pi,Uqj,L < qi,U} (42) wi ≥ wi,L = card{j ∈ {1, ..., n}|pj,U < pi,Lqj,U < qi,L} (43)

Bounds for Kendall’s τ

τ ≤ 4 n(n − 1)

n

  • i=1

wi,U − 1 (44) τ ≥ 4 n(n − 1)

n

  • i=1

wi,L − 1 (45)

slide-28
SLIDE 28

Computation of inner bounds

Sampling or optimization

Inner bounds can be computed through sampling or optimization within the feasible set

𝜍𝑀 𝜍𝑉 1 −1 Lower inner bound Upper inner bound Lower outer bound Upper outer bound

Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 25 / 46

slide-29
SLIDE 29

Heuristic solutions mimicking strong dependence

Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 26 / 46

slide-30
SLIDE 30

Computation of inner bounds

Optimization

A few different optimizers were used: Differential Evolution (Opara and Arabas, 2010; Storn and Price, 1997) Random walk in the space of linear extensions (Bubley and Dyer, 1998; Denœux et al., 2005) Monte Carlo sampling

Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 27 / 46

slide-31
SLIDE 31

Convergence curves

1 2 3 4 5 6 7 8 9 10 x 10

4

−0.9 −0.8 −0.7 −0.6 −0.5 −0.4 −0.3 −0.2 Function evaluations Kendall τ Clayton τ = −0.6 Kendall τ of crisp origins Heuristics and DE/rand/∞/bin DE/rand/∞/bin initialized randomly Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 28 / 46

slide-32
SLIDE 32

Inner and outer bounds for Clayton’s copulas

5000 10000 −1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1 Function evaluations Kendall’s τ a) strong negative 5000 10000 −1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1 Function evaluations Kendall’s τ b) medium negetive 5000 10000 −1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1 Function evaluations Kendall’s τ c) weak negative 5000 10000 −1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1 Function evaluations Kendall’s τ d) weak positive 5000 10000 −1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1 Function evaluations Kendall’s τ e) medium positive 5000 10000 −1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1 Function evaluations Kendall’s τ f) strong positive Kendall’s τ of crisp origins Heuristics and DE/rand/∞/bin DE/rand/∞/bin initialized randomly Outer bounds for Kendall’s τ

Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 29 / 46

slide-33
SLIDE 33

Simulation study

slide-34
SLIDE 34

Design of the experiment

Copula Original coefficient τ −0, 9 −0, 5 −0, 1 0, 1 0, 5 0, 9 normal 100 100 100 100 100 100 Clayton 100 100 100 100 100 100 Frank 100 100 100 100 100 100 FGM 100 100 Gumbel 100 100 100

Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 31 / 46

slide-35
SLIDE 35

Clayton copula τ ± 0.5

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3 x y −3 −2 −1 1 2 3 −3 −2 −1 1 2 3 x y Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 32 / 46

slide-36
SLIDE 36

Clayton copula τ ± 0.5

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3 x y −3 −2 −1 1 2 3 −3 −2 −1 1 2 3 x y Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 33 / 46

slide-37
SLIDE 37

Clayton copula τ ± 0.5

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3 x y −3 −2 −1 1 2 3 −3 −2 −1 1 2 3 x y Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 34 / 46

slide-38
SLIDE 38

Frank copula τ ± 0.5

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3 x y −3 −2 −1 1 2 3 −3 −2 −1 1 2 3 x y Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 35 / 46

slide-39
SLIDE 39

Frank copula τ ± 0.5

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3 x y −3 −2 −1 1 2 3 −3 −2 −1 1 2 3 x y Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 36 / 46

slide-40
SLIDE 40

Frank copula τ ± 0.5

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3 x y −3 −2 −1 1 2 3 −3 −2 −1 1 2 3 x y Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 37 / 46

slide-41
SLIDE 41

Simulations

Simulations: 23 cases 100 sets for each case 5 runs for each set 105 objective function evaluations for each run Computations were performed in the Interdisciplinary Centre for Mathematical and Computational Modelling (ICM) at Warsaw University within the computational grant No. G55-13

Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 38 / 46

slide-42
SLIDE 42

Median inner bounds for Kendall’s τ

Copula Algorithm Median τL Median τU medium dependence τ = 0.5 normal Heur 0.50313 0.60154 HeurDE 0.41253 0.60154 DE 0.41273 0.59758 MC 0.47434 0.55838 BD 0.4998 0.53414 Clayton Heur 0.47436 0.57688 HeurDE 0.37253 0.57688 DE 0.37172 0.56242 MC 0.43000 0.51838 BD 0.4598 0.49131 Frank Heur 0.43518 0.55301 HeurDE 0.33051 0.55301 DE 0.33051 0.53374 MC 0.39556 0.48566 BD 0.42869 0.46141

Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 39 / 46

slide-43
SLIDE 43

Results

Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 40 / 46

slide-44
SLIDE 44

Results

Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 41 / 46

slide-45
SLIDE 45

A meteorological example

slide-46
SLIDE 46

A meteorological example

Cloud cover fraction for Warsaw Chopin Airport is discretized to {0, 0.1, 0.25, 0.4, 0.5, 0.6, 0.75, 0.9, 1}

Jan 01 Jan 03 Jan 05 Jan 07 Jan 09 Jan 11 Jan 13 Jan 15 0.1 0.25 0.4 0.5 0.6 0.75 0.9 1 Measurement time Cloud cover fraction [−]

Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 43 / 46

slide-47
SLIDE 47

A meteorological example

To what extend current observations can be substituted by lagged ones? Autocorrelation of cloud cover fraction

1 2 3 6 12 24 48 72 96 −0.2 0.2 0.4 0.6 0.8 1 Time lag [h] Autocorrelation r [−] a) Pearson’s r 1 2 3 6 12 24 48 72 96 −0.2 0.2 0.4 0.6 0.8 1 Time lag [h] Autocorrelation ρ [−] b) Spearman’s ρ 1 2 3 6 12 24 48 72 96 −0.2 0.2 0.4 0.6 0.8 1 Time lag [h] Autocorrelation τ [−] c) Kendall’s τ

Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 44 / 46

slide-48
SLIDE 48

Conclusions

1 Crisp and interval generalized correlation coefficients are discussed 2 Outer bounds for Spearman’s ρ and Kendall’s τ are derived 3 Comparison of algorithms computing correlation coefficients for

interval data

4 Simple heuristic solutions prove effective for strong dependencies 5 Simulation study and a real data example show applicability of the

approach

Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 45 / 46

slide-49
SLIDE 49

Acknowledgments

1 The study is cofounded by the European Union from resources of the

European Social Fund. Project PO KL “Information technologies: Research and their interdisciplinary applications”, Agreement UDA-POKL.04.01.01-00-051/10-00

2 Computations were performed in the Interdisciplinary Centre for

Mathematical and Computational Modelling (ICM) at Warsaw University within the computational grant no G55-13

Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 46 / 46

slide-50
SLIDE 50
  • R. Bubley and M. Dyer. Faster random generation of linear extensions. In
  • Proc. 9th Annu. ACM-SIAM Symp. on Discrete Algorithms, pages

175–186, 1998.

  • T. Denœux, M.-H. Masson, and P.A. H´
  • ebert. Nonparametric rank-based

statistics and significance tests for fuzzy data. Fuzzy Sets and Systems, 153(1):1–28, 2005. Gabriel Frahm, Markus Junker, and Alexander Szimayer. Elliptical copulas: applicability and limitations. Statistics and Probability Letters, 63: 275–286, 2003. Gregory A. Fredricks and Roger B. Nelsen. On the relationship between Spearman’s rho and Kendall’s tau for pairs of continuous random

  • variables. Journal of Statistical Planning and Inference, 137:2143–2150,

2007.

  • C. Genest and R. J. McKay. The joy of copulas: Bivariate distributions

with uniform marginals. American Statistician, 40(4):1034–1043, 1986.

  • C. Genest and L.-P. Rivest. Statistical inference procedures for bivariate

Archimedean copulas. Journal of the American Statistical Association, 88(423):1034–1043, 1993.

Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 46 / 46

slide-51
SLIDE 51

Jan Hauke and Tomasz Kossowski. Comparison of values of Pearson’s and Spearman’s correlation coefficients on the same sets of data. Quaestiones geographicae, 30:87–93, 2011. Maurice G. Kendall. Rank correlation methods. Griffin, London, 1955. Roger B. Nelsen. Copulas and association. In Advances in probability distributions with given marginals, pages 51–74. Springer Netherlands, 1991. Roger B Nelsen. On measures of association as measures of positive

  • dependence. Statistics & probability letters, 14(4):269–274, 1992.
  • K. Opara and J. Arabas. Differential mutation based on population

covariance matrix. In R. Schaefer, editor, Parallel Problem Solving from Nature PPSN XI, part I, volume 6238 of Lecture Notes in Computer Science, pages 114–123. Springer, 2010. Rainer Storn and Kenneth Price. Differential Evolution – a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization, 11:341–359, 1997.

Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 46 / 46