Rank correlation coefficients and their generalizations for interval - - PowerPoint PPT Presentation
Rank correlation coefficients and their generalizations for interval - - PowerPoint PPT Presentation
Rank correlation coefficients and their generalizations for interval data Karol Opara and Olgierd Hryniewicz Systems Research Institute Polish Academy of Sciences Konferencja Statystyka Matematyczna Bdlewo 28 October 2016
Introduction
Measuring of dependence
Quantifying and testing dependence is one of the major tasks of statistics
Imprecise data
How to compute correlation coefficients for imprecise data? Opara K. and Hryniewicz O. (2016) Computation of general correlation coefficients for interval data International Journal of Approximate Reasoning 73 pp. 56–75.
Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 2 / 46
Crisp correlation coefficients
General (crisp) correlation coefficient
We have a set of n objects characterized by two properties x and y To any pair of individuals, say i-th and j-th, one can assign x-score aij = −aji and y-score bij = −bji Kendall (1955) describes a general correlation coefficient Γ as Γ = n
i,j=1 aijbij
n
i,j=1 a2 ij
n
i,j=1 b2 ij
(1) Scores aij and bij are regarded zero if i = j
Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 4 / 46
Pearson’s r
General correlation coefficient Γ = n
i,j=1 aijbij
n
i,j=1 a2 ij
n
i,j=1 b2 ij
(2) Pearson’s r is based on variate values aij = xj − xi (3) bij = yj − yi (4)
Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 5 / 46
Rank coefficients ρ and τ
General correlation coefficient Γ = n
i,j=1 aijbij
n
i,j=1 a2 ij
n
i,j=1 b2 ij
(5) Spearman’s ρ is based on ranks aij = pj − pi (6) bij = qj − qi (7) Kendall’s τ is based on ±1 scores aij =
- +1
if pi < pj −1 if pi > pj (8) bij =
- +1
if qi < qj −1 if qi > qj (9) Variants τa and τb differently resolve ties in data, τa ≤ τb
Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 6 / 46
Relations between rank correlation coefficients
Daniels’ inequality −1 ≤ 3τ − 2ρ ≤ 1 (10) refined by Durbin and Stuart 3 2τ − 1 2 ≤ ρ ≤ 1 2 + τ + 1 2τ 2 for τ > 0 (11) 1 2τ 2 + τ − 1 2 ≤ ρ ≤ 3 2τ + 1 2 for τ < 0 (12) Fredricks and Nelsen (2007) proved relation that for a limiting case of dependence weakening towards independence 2ρ → 3τ (13)
Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 7 / 46
Possible values of ρ and τ (Nelsen, 1991)
Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 8 / 46
Relation between Pearson’s r and Kendall’s τ
For elliptical distributions (e.g. bivariate normal) (Frahm et al., 2003) τ = 2 π arc sin(r) (14) Typically (Hauke and Kossowski, 2011): large r ⇒ large τ and ρ (15) small r ⇒ small τ and ρ (16)
Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 9 / 46
Computational formulas for crisp Kendall’s τ
Counting concordant and discordant pairs C = cardi=j{(xi − xj) · (yi − yj) > 0} (17) D = cardi=j{(xi − xj) · (yi − yj) < 0} (18) τ = C − D n · (n − 1) (19) Denœux et al. (2005) imposed linear orders LX and LY on each variate and counted the number of pairs ordered the same way by both of them τ = τ(LX, LY ) = 4 card{LX ∩ LY } n(n − 1) − 1 (20)
Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 10 / 46
Interpretations
Kendall’s τ – a function of minimal number of transpositions required to order a sample Spearman’s ρ – interpretation in terms of concordances exists requiring a bivariate sample and a pair of independent random variables with the same marginals as the initial ones (Kendall, 1955; Nelsen, 1991)
Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 11 / 46
Copulas
Sklar’s theorem
Let F(x) and G(y) be continuous CDFs and H(x, y) be two-dimensional CDF of a random variable with marginals F and G. There exists a unique function C, called copula, such that H(x, y) = C (F(x), G(y)) Copulas are invariant against order-preserving transformations such as ranking X → F(X) Rank-based measures of association are properties of copulas
Spearman’s ρ has geometric interpretation in terms of copulas as a scaled proportion of volume of the [0, 1]3 cube under the copula surface. Nelsen (1992) gives a similar interpretation for Kendall’s τ.
Gaussian copula, parameter ρG equals Pearson’s r for normal marginals C(u1, u2; ρ) = ΦN(Φ−1(u1), Φ−1(u2); ρG) (21)
Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 12 / 46
Copulas and correlation
Population version of Kendall’s τ and Spearman’s ρ can be defined in terms of copulas (Pearson’s r cannot be) τ(X, Y ) = 4E(C(F(X), G(Y )) − 1 (22) Genest and McKay (1986) used CDF K(t) of a random variable T = C(U1, U2), where U1 and U2 are random variables uniformly distributed on [0, 1] to show that τ = 3 − 4 1 K(t)dt (23) Kendall’s τ as the difference between the probabilities of concordance and discordance (for population version of the statistic) τ = P ((x1 − x2)(y1 − y2) > 0) − P ((x1 − x2)(y1 − y2) < 0) (24)
Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 13 / 46
Interval correlation coefficients
Interval correlation coefficients
For interval data z ∈ [zL, zU], y ∈ [yL, yU] one obtains interval correlation coefficients [τL, τU] τL([zL, zU], [yL, yU]) = arg min
z∈[zL,zU],y∈[yL,yU]
τ(z, y) τU([zL, zU], [yL, yU]) = arg max
z∈[zL,zU],y∈[yL,yU]
τ(z, y)
Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 15 / 46
Interval data
Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 16 / 46
Interval data
Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 16 / 46
Interval data
Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 16 / 46
Interval data
Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 16 / 46
Cross-section (Kendall’s τ)
Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 17 / 46
−1 1 −1 1 0.5 0.55 0.6 x y Pearson’s r x y −1 −0.5 0.5 1 −1 −0.5 0.5 1 −1 1 −1 1 0.4 0.5 0.6 x y Spearman’s ρ x y −1 −0.5 0.5 1 −1 −0.5 0.5 1 −1 1 −1 1 0.32 0.34 0.36 0.38 0.4 x y Kendall’s τ x y −1 −0.5 0.5 1 −1 −0.5 0.5 1
Computational formulas for Pearson’s r
Mean product of standard scores r = 1 n − 1
n
- i=1
xi − ¯ x sx yi − ¯ y sy
- (25)
After studentization simplifies to a quadratic form r = 1 n − 1
n
- i=1
x′
i y′ i
(26)
- r equivalently
r = 1 2 1 n − 1[x′
1, ..., x′ n, y′ 1, ..., y′ n]
0n In In 0n
- [x′
1, ..., x′ n, y′ 1, ..., y′ n]T
(27) The matrix has eigenvalues ±1
Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 19 / 46
Computation of interval correlation coefficients
Inner and outer bounds on correlation coefficients
Computational complexity
For large problems (say n > 8) only approximate solutions are feasible
𝜍𝑀 𝜍𝑉 1 −1 Lower inner bound Upper inner bound Lower outer bound Upper outer bound
Possible values of ranks pi ∈ {pi,L, pi,L + 1, ..., pi,U} (28) qi ∈ {qi,L, qi,L + 1, ..., qi,U} (29)
Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 21 / 46
Outer bounds for Spearman’s ρ
Crisp Spearman’s ρ can be computed as ρ = 12 n(n2 − 1)
n
- i=1
piqi − 3(n + 1) n − 1 (30) Products of ranks can be bounded by piqi ≤ pi,Uqi,U (31) piqi ≥ pi,Lqi,L (32)
Interval Spearman’s coefficient can be bounded by
ρ ≤ 12 n(n2 − 1)
n
- i=1
pUqU − 3(n + 1) n − 1 (33) ρ ≥ 12 n(n2 − 1)
n
- i=1
pLqL − 3(n + 1) n − 1 (34)
Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 22 / 46
Outer bounds for Spearman’s ρ
Alternatively, crisp Spearman’s ρ can be computed as ρ = 1 − 6 n
i=1(pi − qi)2
n(n2 − 1) (35) Squared difference of ranks ca be bounded by (pi − qi)2 ≤ max
- (pi,U − qi,L)2, (qi,U − pi,L)2
(36) (pi − qi)2 ≥ (max {0, pi,L − qi,U, qi,L − pi,U})2 (37)
Another bound is obtained
ρ ≤ 1 − 6 n
i=1 max
- (pi,U − qi,L)2, (qi,U − pi,L)2
n(n2 − 1) (38) ρ ≥ 1 − 6 n
i=1 (max {0, pi,L − qi,U, qi,L − pi,U})2
n(n2 − 1) (39)
Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 23 / 46
Reformulation by Genest and Rivest (1993) τ = 4 n(n − 1)
n
- i=1
wi − 1 (40) wi = card{j ∈ {1, ..., n}|pj < pi, qj < qi} (41) wi ≤ wi,U = card{j ∈ {1, ..., n}|pj,L < pi,Uqj,L < qi,U} (42) wi ≥ wi,L = card{j ∈ {1, ..., n}|pj,U < pi,Lqj,U < qi,L} (43)
Bounds for Kendall’s τ
τ ≤ 4 n(n − 1)
n
- i=1
wi,U − 1 (44) τ ≥ 4 n(n − 1)
n
- i=1
wi,L − 1 (45)
Computation of inner bounds
Sampling or optimization
Inner bounds can be computed through sampling or optimization within the feasible set
𝜍𝑀 𝜍𝑉 1 −1 Lower inner bound Upper inner bound Lower outer bound Upper outer bound
Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 25 / 46
Heuristic solutions mimicking strong dependence
Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 26 / 46
Computation of inner bounds
Optimization
A few different optimizers were used: Differential Evolution (Opara and Arabas, 2010; Storn and Price, 1997) Random walk in the space of linear extensions (Bubley and Dyer, 1998; Denœux et al., 2005) Monte Carlo sampling
Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 27 / 46
Convergence curves
1 2 3 4 5 6 7 8 9 10 x 10
4
−0.9 −0.8 −0.7 −0.6 −0.5 −0.4 −0.3 −0.2 Function evaluations Kendall τ Clayton τ = −0.6 Kendall τ of crisp origins Heuristics and DE/rand/∞/bin DE/rand/∞/bin initialized randomly Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 28 / 46
Inner and outer bounds for Clayton’s copulas
5000 10000 −1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1 Function evaluations Kendall’s τ a) strong negative 5000 10000 −1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1 Function evaluations Kendall’s τ b) medium negetive 5000 10000 −1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1 Function evaluations Kendall’s τ c) weak negative 5000 10000 −1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1 Function evaluations Kendall’s τ d) weak positive 5000 10000 −1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1 Function evaluations Kendall’s τ e) medium positive 5000 10000 −1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1 Function evaluations Kendall’s τ f) strong positive Kendall’s τ of crisp origins Heuristics and DE/rand/∞/bin DE/rand/∞/bin initialized randomly Outer bounds for Kendall’s τ
Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 29 / 46
Simulation study
Design of the experiment
Copula Original coefficient τ −0, 9 −0, 5 −0, 1 0, 1 0, 5 0, 9 normal 100 100 100 100 100 100 Clayton 100 100 100 100 100 100 Frank 100 100 100 100 100 100 FGM 100 100 Gumbel 100 100 100
Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 31 / 46
Clayton copula τ ± 0.5
−3 −2 −1 1 2 3 −3 −2 −1 1 2 3 x y −3 −2 −1 1 2 3 −3 −2 −1 1 2 3 x y Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 32 / 46
Clayton copula τ ± 0.5
−3 −2 −1 1 2 3 −3 −2 −1 1 2 3 x y −3 −2 −1 1 2 3 −3 −2 −1 1 2 3 x y Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 33 / 46
Clayton copula τ ± 0.5
−3 −2 −1 1 2 3 −3 −2 −1 1 2 3 x y −3 −2 −1 1 2 3 −3 −2 −1 1 2 3 x y Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 34 / 46
Frank copula τ ± 0.5
−3 −2 −1 1 2 3 −3 −2 −1 1 2 3 x y −3 −2 −1 1 2 3 −3 −2 −1 1 2 3 x y Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 35 / 46
Frank copula τ ± 0.5
−3 −2 −1 1 2 3 −3 −2 −1 1 2 3 x y −3 −2 −1 1 2 3 −3 −2 −1 1 2 3 x y Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 36 / 46
Frank copula τ ± 0.5
−3 −2 −1 1 2 3 −3 −2 −1 1 2 3 x y −3 −2 −1 1 2 3 −3 −2 −1 1 2 3 x y Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 37 / 46
Simulations
Simulations: 23 cases 100 sets for each case 5 runs for each set 105 objective function evaluations for each run Computations were performed in the Interdisciplinary Centre for Mathematical and Computational Modelling (ICM) at Warsaw University within the computational grant No. G55-13
Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 38 / 46
Median inner bounds for Kendall’s τ
Copula Algorithm Median τL Median τU medium dependence τ = 0.5 normal Heur 0.50313 0.60154 HeurDE 0.41253 0.60154 DE 0.41273 0.59758 MC 0.47434 0.55838 BD 0.4998 0.53414 Clayton Heur 0.47436 0.57688 HeurDE 0.37253 0.57688 DE 0.37172 0.56242 MC 0.43000 0.51838 BD 0.4598 0.49131 Frank Heur 0.43518 0.55301 HeurDE 0.33051 0.55301 DE 0.33051 0.53374 MC 0.39556 0.48566 BD 0.42869 0.46141
Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 39 / 46
Results
Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 40 / 46
Results
Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 41 / 46
A meteorological example
A meteorological example
Cloud cover fraction for Warsaw Chopin Airport is discretized to {0, 0.1, 0.25, 0.4, 0.5, 0.6, 0.75, 0.9, 1}
Jan 01 Jan 03 Jan 05 Jan 07 Jan 09 Jan 11 Jan 13 Jan 15 0.1 0.25 0.4 0.5 0.6 0.75 0.9 1 Measurement time Cloud cover fraction [−]
Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 43 / 46
A meteorological example
To what extend current observations can be substituted by lagged ones? Autocorrelation of cloud cover fraction
1 2 3 6 12 24 48 72 96 −0.2 0.2 0.4 0.6 0.8 1 Time lag [h] Autocorrelation r [−] a) Pearson’s r 1 2 3 6 12 24 48 72 96 −0.2 0.2 0.4 0.6 0.8 1 Time lag [h] Autocorrelation ρ [−] b) Spearman’s ρ 1 2 3 6 12 24 48 72 96 −0.2 0.2 0.4 0.6 0.8 1 Time lag [h] Autocorrelation τ [−] c) Kendall’s τ
Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 44 / 46
Conclusions
1 Crisp and interval generalized correlation coefficients are discussed 2 Outer bounds for Spearman’s ρ and Kendall’s τ are derived 3 Comparison of algorithms computing correlation coefficients for
interval data
4 Simple heuristic solutions prove effective for strong dependencies 5 Simulation study and a real data example show applicability of the
approach
Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 45 / 46
Acknowledgments
1 The study is cofounded by the European Union from resources of the
European Social Fund. Project PO KL “Information technologies: Research and their interdisciplinary applications”, Agreement UDA-POKL.04.01.01-00-051/10-00
2 Computations were performed in the Interdisciplinary Centre for
Mathematical and Computational Modelling (ICM) at Warsaw University within the computational grant no G55-13
Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 46 / 46
- R. Bubley and M. Dyer. Faster random generation of linear extensions. In
- Proc. 9th Annu. ACM-SIAM Symp. on Discrete Algorithms, pages
175–186, 1998.
- T. Denœux, M.-H. Masson, and P.A. H´
- ebert. Nonparametric rank-based
statistics and significance tests for fuzzy data. Fuzzy Sets and Systems, 153(1):1–28, 2005. Gabriel Frahm, Markus Junker, and Alexander Szimayer. Elliptical copulas: applicability and limitations. Statistics and Probability Letters, 63: 275–286, 2003. Gregory A. Fredricks and Roger B. Nelsen. On the relationship between Spearman’s rho and Kendall’s tau for pairs of continuous random
- variables. Journal of Statistical Planning and Inference, 137:2143–2150,
2007.
- C. Genest and R. J. McKay. The joy of copulas: Bivariate distributions
with uniform marginals. American Statistician, 40(4):1034–1043, 1986.
- C. Genest and L.-P. Rivest. Statistical inference procedures for bivariate
Archimedean copulas. Journal of the American Statistical Association, 88(423):1034–1043, 1993.
Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 46 / 46
Jan Hauke and Tomasz Kossowski. Comparison of values of Pearson’s and Spearman’s correlation coefficients on the same sets of data. Quaestiones geographicae, 30:87–93, 2011. Maurice G. Kendall. Rank correlation methods. Griffin, London, 1955. Roger B. Nelsen. Copulas and association. In Advances in probability distributions with given marginals, pages 51–74. Springer Netherlands, 1991. Roger B Nelsen. On measures of association as measures of positive
- dependence. Statistics & probability letters, 14(4):269–274, 1992.
- K. Opara and J. Arabas. Differential mutation based on population
covariance matrix. In R. Schaefer, editor, Parallel Problem Solving from Nature PPSN XI, part I, volume 6238 of Lecture Notes in Computer Science, pages 114–123. Springer, 2010. Rainer Storn and Kenneth Price. Differential Evolution – a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization, 11:341–359, 1997.
Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 46 / 46