Vine copula mixture models and clustering for non-Gaussian data
Statistical Methods in Machine Learning
- Prof. Claudia Czado
¨ Ozge Sahin <ozge.sahin@tum.de>
Bernoulli-IMS One World Symposium August 2020
Vine copula mixture models and clustering for non-Gaussian data - - PowerPoint PPT Presentation
Vine copula mixture models and clustering for non-Gaussian data Statistical Methods in Machine Learning Prof. Claudia Czado Ozge Sahin <ozge.sahin@tum.de> Bernoulli-IMS One World Symposium August 2020 Finite mixture models k
Bernoulli-IMS One World Symposium August 2020
k components generate data
k
¨ Ozge Sahin Vine copula mixture models and clustering for non-Gaussian data August 2020 1 / 13
Representation of diverse dependence structures in the data
k
¨ Ozge Sahin Vine copula mixture models and clustering for non-Gaussian data August 2020 2 / 13
Efficient tools for high-dimensional dependence modeling
F1(x1), . . . , Fd(xd) · f1(x1) · · · fd(xd),
¨ Ozge Sahin Vine copula mixture models and clustering for non-Gaussian data August 2020 3 / 13
Decompose a component’s density into marginal and 2d-copula dens.
T(1)1 1 2 3
C(1)1,2 C(1)2,3
T(1)2 1,2 2,3
C(1)1,3;2
(a) First component
T(2)1 2 1 3
C(2)1,2 C(2)1,3
T(2)2 1,2 1,3
C(2)2,3;1
(b) Second component
Figure 1: Vine copula model of two components.
g1(x; ψ1) =c(1)1,2
(4) ¨ Ozge Sahin Vine copula mixture models and clustering for non-Gaussian data August 2020 4 / 13
Work with an assignment of the observations to the components
k
∀i
¨ Ozge Sahin Vine copula mixture models and clustering for non-Gaussian data August 2020 5 / 13
Parametric model selection
¨ Ozge Sahin Vine copula mixture models and clustering for non-Gaussian data August 2020 6 / 13
Estimate parameters with the modified ECM algorithm
n
n
k
k
ℓc(η; z, x) = log
n
k
[πj ·gj (xi ; ψj )]zi,j =
n
k
zi,j ·log πj +
n
k
zi,j ·log gj (xi ; ψj ), (7) ¨ Ozge Sahin Vine copula mixture models and clustering for non-Gaussian data August 2020 7 / 13
Estimate parameters with the modified ECM algorithm
i,j
j
j
k
j
j
j
n
i,j
γj n
i,j
j
¨ Ozge Sahin Vine copula mixture models and clustering for non-Gaussian data August 2020 8 / 13
Consists of 7 primary building blocks
¨ Ozge Sahin Vine copula mixture models and clustering for non-Gaussian data August 2020 9 / 13
Captures the non-Gaussian components hidden in the data Figure 2: Pairwise scatter plot of the subset of AIS data(left), red:females, green:males. Pairs plots of females(middle) and males(right).
Model vcmmc GMM skew normal t skew-t k-means Misclassification rate 0.02 0.09 0.04 0.29 0.04 0.34 BIC 6942 7062 7055 7092 7048
41 30 51 41 51
¨ Ozge Sahin Vine copula mixture models and clustering for non-Gaussian data August 2020 10 / 13
Nicely interprets the structure of the data
Ferr Ht LBM Wt WBC Males
N(-0.27/-0.17) C(1.84/0.48) SG(7.64/0.87) F(1.62/0.18)
LBM Wt WBC Ht Ferr Females
SG(3.90/0.74) F(-0.15/-0.02) C(1.95/0.49) N(0.11/0.07)
Figure 3: The first tree level of the estimated vine copula model for females and males. A capital letter at an edge refers to its bivariate copula family, where N: Gaussian, C: Clayton, SG: Survival Gumbel, and F: Frank copula. The estimated parameter value and corresponding Kendall’s τ of the pair copula are given inside the parenthesis (estimated parameter/Kendall’s ˆ τ).
¨ Ozge Sahin Vine copula mixture models and clustering for non-Gaussian data August 2020 11 / 13
Appealing ad promising framework
¨ Ozge Sahin Vine copula mixture models and clustering for non-Gaussian data August 2020 12 / 13
Aas, K., Czado, C., Frigessi, A., and Bakken, H. (2009). Pair-copula constructions of multiple dependence. Insurance: Mathematics and Economics, 44(2):182 – 198. Bedford, T. and Cooke, R. M. (2002). Vines - A new graphical model for dependent random variables. Annals of Statistics, 30(4):1031–1068. Dißmann, J., Brechmann, E. C., Czado, C., and Kurowicka, D. (2013). Selecting and estimating regular vine copulae and application to financial returns. Computational Statistics and Data Analysis, 59:52 – 69. Meng, X.-L. and Rubin, D. B. (1993). Maximum Likelihood Estimation via the ECM Algorithm: A General Framework. Biometrika, 80(2):267–278. Sklar, A. (1959). Fonctions de R´ epartition ` a n Dimensions et Leurs Marges. Publications de L’Institut de Statistique de L’Universit´ e de Paris, (8):229–231.
¨ Ozge Sahin Vine copula mixture models and clustering for non-Gaussian data August 2020 13 / 13