Vine copula mixture models and clustering for non-Gaussian data - - PowerPoint PPT Presentation

vine copula mixture models and clustering for non
SMART_READER_LITE
LIVE PREVIEW

Vine copula mixture models and clustering for non-Gaussian data - - PowerPoint PPT Presentation

Vine copula mixture models and clustering for non-Gaussian data Statistical Methods in Machine Learning Prof. Claudia Czado Ozge Sahin <ozge.sahin@tum.de> Bernoulli-IMS One World Symposium August 2020 Finite mixture models k


slide-1
SLIDE 1

Vine copula mixture models and clustering for non-Gaussian data

Statistical Methods in Machine Learning

  • Prof. Claudia Czado

¨ Ozge Sahin <ozge.sahin@tum.de>

Bernoulli-IMS One World Symposium August 2020

slide-2
SLIDE 2

Finite mixture models

k components generate data

The density of a finite mixture model for X = (X1, . . . , Xd)⊤ at x = (x1, . . . , xd)⊤ can be written as: g(x; η) =

k

  • j=1

πj · gj(x; ψj). (1) How to select densities of each component gj(x; ψj)? Symmetric distributions, skewed distributions, and

  • thers...

¨ Ozge Sahin Vine copula mixture models and clustering for non-Gaussian data August 2020 1 / 13

slide-3
SLIDE 3

Vine copula mixture models, vcmm

Representation of diverse dependence structures in the data

The density of a finite mixture model for X = (X1, . . . , Xd)⊤ at x = (x1, . . . , xd)⊤ can be written as: g(x; η) =

k

  • j=1

πj · gj(x; ψj). (2) How to select flexible densities of each component gj(x; ψj) so the model can represent different asymmetric or/and tail dependencies for different pairs of variables? Vine copulas

¨ Ozge Sahin Vine copula mixture models and clustering for non-Gaussian data August 2020 2 / 13

slide-4
SLIDE 4

Vine copulas

Efficient tools for high-dimensional dependence modeling

A bivariate copula C: Distribution on [0, 1]2 with univariate uniform margins. Vine copulas: For higher-dimensional data, Bivariate copulas are building blocks [Aas et al., 2009], Bivariate copulas and a nested set of trees determine dependence structure [Bedford and Cooke, 2002]. Sklar’s Theorem [Sklar, 1959] A d-dimensional density can be decomposed into products of marginal densities and bivariate copula densities assuming absolute continuity of random variables: g(x) = c

F1(x1), . . . , Fd(xd) · f1(x1) · · · fd(xd),

x ∈ Rd. (3)

¨ Ozge Sahin Vine copula mixture models and clustering for non-Gaussian data August 2020 3 / 13

slide-5
SLIDE 5

Vine copula mixture models, vcmm

Decompose a component’s density into marginal and 2d-copula dens.

T(1)1 1 2 3

C(1)1,2 C(1)2,3

T(1)2 1,2 2,3

C(1)1,3;2

(a) First component

T(2)1 2 1 3

C(2)1,2 C(2)1,3

T(2)2 1,2 1,3

C(2)2,3;1

(b) Second component

Figure 1: Vine copula model of two components.

The density of the first component at x = (x1, x2, x3)⊤:

g1(x; ψ1) =c(1)1,2

  • F1(1)(x1; γ1(1)), F2(1)(x2; γ2(1)); θ(1)1,2
  • · c(1)2,3
  • F2(1)(x2; γ2(1)), F3(1)(x3; γ3(1)); θ(1)2,3
  • · c(1)1,3;2
  • F(1)1|2(x1|x2; γ1(1), γ2(1), θ(1)1,2), F(1)3|2(x3|x2; γ3(1), γ2(1), θ(1)2,3); θ(1)1,3;2
  • · f1(1)(x1; γ1(1)) · f2(1)(x2; γ2(1)) · f3(1)(x3; γ3(1)),

(4) ¨ Ozge Sahin Vine copula mixture models and clustering for non-Gaussian data August 2020 4 / 13

slide-6
SLIDE 6

Vine copula mixture models, vcmm

Work with an assignment of the observations to the components

Input: d-dimensional n observations to cluster xi = (xi,1, . . . , xi,d)⊤ ∈ Rd for i = 1, . . . , n, Total number of clusters k. A partition of the observations: Total number of observations assigned to the jth component is nj, The observations belonging to the jth component x(j)ij = (x(j)ij,1, . . . , x(j)ij,d)⊤ for ij = 1, . . . nj and j = 1, . . . , k,

k

  • j=1

nj = n and

  • ∀(j,ij)

x(j)ij =

∀i

xi.

¨ Ozge Sahin Vine copula mixture models and clustering for non-Gaussian data August 2020 5 / 13

slide-7
SLIDE 7

Vine copula mixture models, vcmm

Parametric model selection

For a variable xp(j) = (x(j)1,p, . . . , x(j)nj,p)⊤, p = 1, . . . , d and j = 1, . . . , k,

  • 1. Marginal distribution selection Fj: For each candidate for

marginal distribution on the variable xp(j), find the parameters that maximize the log-likelihood ℓ(ˆ γp(j)), then select the marginal distribution ˆ Fp(j) with the lowest AIC.

  • 2. Vine tree structure selection Vj: Obtain u-data by applying

probability integral transformation: ˆ up(j) = ˆ Fp(j)(xp(j); ˆ γp(j)). Then follow the greedy algorithm of [Dißmann et al., 2013].

  • 3. Pair copula family selection Bj(Vj): Given the vine tree

structure, estimate the copula parameters that maximize the log-likelihood ℓ(ˆ θ(j)ea,eb;De). Later choose the copula family with the lowest AIC.

¨ Ozge Sahin Vine copula mixture models and clustering for non-Gaussian data August 2020 6 / 13

slide-8
SLIDE 8

Vine copula mixture models

Estimate parameters with the modified ECM algorithm

The log-likelihood of the given data: ℓ(η) = log

n

  • i=1

g(xi; ψ) = log

n

  • i=1

k

  • j=1

πj · gj(xi; ψj). (5) Introduce latent variables zi = (zi,1, . . . , zi,k)⊤ zi,j =

  • 1,

if xi belongs to the jth component, 0,

  • therwise,

(6) and

k

  • j=1

zi,j = 1. The complete data log-likelihood ℓc(η; z, x) of the com- plete data yi = (xi, zi)⊤:

ℓc(η; z, x) = log

n

  • i=1

k

  • j=1

[πj ·gj (xi ; ψj )]zi,j =

n

  • i=1

k

  • j=1

zi,j ·log πj +

n

  • i=1

k

  • j=1

zi,j ·log gj (xi ; ψj ), (7) ¨ Ozge Sahin Vine copula mixture models and clustering for non-Gaussian data August 2020 7 / 13

slide-9
SLIDE 9

Vine copula mixture models, vcmm

Estimate parameters with the modified ECM algorithm

Our steps at the (t + 1)th iteration:

  • 1. E-step (Posterior probabilities)

r (t+1)

i,j

= π(t)

j

gj(xi; ψ(t)

j

)

k

  • j=1

π(t)

j

gj(xi; ψ(t)

j

) for i = 1, . . . n and j = 1, . . . k. (8)

  • 2. CM-step 1 (Mixture weights)

π(t+1)

j

=

n

  • i=1

r (t+1)

i,j

n for j = 1, . . . k. (9)

  • 3. CM-step 2 (Marginal parameters)

max

γj n

  • i=1

r (t+1)

i,j

· log gj(xi; γj, θ(t)

j

) for j = 1, . . . k (10)

  • 4. CMR-step (Pair copula parameters updated sequentially)

¨ Ozge Sahin Vine copula mixture models and clustering for non-Gaussian data August 2020 8 / 13

slide-10
SLIDE 10

Vine copula based clustering, vcmmc

Consists of 7 primary building blocks

  • 1. Initial clustering assignment,
  • 2. Initial model selection with Markov trees and

parametric marginal distributions,

  • 3. Iterative parameter estimation with the modified ECM,
  • 4. Temporary clustering assignment,
  • 5. Temporary model selection with full vine specification,
  • 6. Final model selection with different initial clustering

methods, i.e. run the steps 1-5 with different initial partitions,

  • 7. Final clustering assignment.

¨ Ozge Sahin Vine copula mixture models and clustering for non-Gaussian data August 2020 9 / 13

slide-11
SLIDE 11

Vine copula based clustering, vcmmc

Captures the non-Gaussian components hidden in the data Figure 2: Pairwise scatter plot of the subset of AIS data(left), red:females, green:males. Pairs plots of females(middle) and males(right).

Model vcmmc GMM skew normal t skew-t k-means Misclassification rate 0.02 0.09 0.04 0.29 0.04 0.34 BIC 6942 7062 7055 7092 7048

  • Number of free parameters

41 30 51 41 51

  • Table 1: Comparison of clustering algorithm performances on the subset
  • f AIS data.

¨ Ozge Sahin Vine copula mixture models and clustering for non-Gaussian data August 2020 10 / 13

slide-12
SLIDE 12

Vine copula based clustering, vcmmc

Nicely interprets the structure of the data

Ferr Ht LBM Wt WBC Males

N(-0.27/-0.17) C(1.84/0.48) SG(7.64/0.87) F(1.62/0.18)

LBM Wt WBC Ht Ferr Females

SG(3.90/0.74) F(-0.15/-0.02) C(1.95/0.49) N(0.11/0.07)

Figure 3: The first tree level of the estimated vine copula model for females and males. A capital letter at an edge refers to its bivariate copula family, where N: Gaussian, C: Clayton, SG: Survival Gumbel, and F: Frank copula. The estimated parameter value and corresponding Kendall’s τ of the pair copula are given inside the parenthesis (estimated parameter/Kendall’s ˆ τ).

¨ Ozge Sahin Vine copula mixture models and clustering for non-Gaussian data August 2020 11 / 13

slide-13
SLIDE 13

Vine copula mixture models and clustering

Appealing ad promising framework

What we have done: A vine copula mixture model, called vcmm, that works with continuous data and fits all classes of vine tree structures, Use of parametric marginal distributions and pair copula families with a single parameter, Data-driven approach for model selection problems, Modified the ECM algorithm [Meng and Rubin, 1993] for parameter estimation, A new and promising model-based clustering algorithm, called vcmmc. Future research directions: Extension for discrete ordinal variables, Dimensionality reduction for vine copula based clustering, Parsimonious vine copula mixture models.

¨ Ozge Sahin Vine copula mixture models and clustering for non-Gaussian data August 2020 12 / 13

slide-14
SLIDE 14

References

Aas, K., Czado, C., Frigessi, A., and Bakken, H. (2009). Pair-copula constructions of multiple dependence. Insurance: Mathematics and Economics, 44(2):182 – 198. Bedford, T. and Cooke, R. M. (2002). Vines - A new graphical model for dependent random variables. Annals of Statistics, 30(4):1031–1068. Dißmann, J., Brechmann, E. C., Czado, C., and Kurowicka, D. (2013). Selecting and estimating regular vine copulae and application to financial returns. Computational Statistics and Data Analysis, 59:52 – 69. Meng, X.-L. and Rubin, D. B. (1993). Maximum Likelihood Estimation via the ECM Algorithm: A General Framework. Biometrika, 80(2):267–278. Sklar, A. (1959). Fonctions de R´ epartition ` a n Dimensions et Leurs Marges. Publications de L’Institut de Statistique de L’Universit´ e de Paris, (8):229–231.

¨ Ozge Sahin Vine copula mixture models and clustering for non-Gaussian data August 2020 13 / 13