consensual aggregation of clusters based on bregman
play

Consensual Aggregation of Clusters based on Bregman Divergences to - PowerPoint PPT Presentation

Consensual Aggregation of Clusters based on Bregman Divergences to Improve Predictive Models Sothea HAS Sorbonne Universit e LPSM, Universit e Paris-Diderot Mathilde Mougeot Aur elie Fischer sothea.has@lpsm.paris 2 avril 2019 1/21


  1. Consensual Aggregation of Clusters based on Bregman Divergences to Improve Predictive Models Sothea HAS Sorbonne Universit´ e LPSM, Universit´ e Paris-Diderot Mathilde Mougeot Aur´ elie Fischer sothea.has@lpsm.paris 2 avril 2019 1/21

  2. Overview A. Introduction B. Construction of a predictive model 1. K-means algorithm with Bregman divergences 2. Construction of candidate estimators 3. Consensual aggregation C. Applications 1. Simulated data 2. Real data 2/21

  3. Consider an example... Input data with 3 clusters Different model on each cluster 3/21

  4. Introduction Setting : ( X , Z ) ∈ X × Z : input-out data. X = R d : input space. � R : regression Z = { 0 , 1 } : binary classification T n = { ( x i , z i ) n i =1 } : iid learning data. Objective : Construct a good predictive model for regression or classification. Assumption : X is composed of more than one group or cluster. The number of clusters K is available. There exists different underlying models on these clusters. 4/21

  5. Construction of a predictive model There are 3 important steps : 1. K-means algorithm with Bregman divergences 2. Construction of candidate estimators 3. Consensual aggregation 5/21

  6. Bregman divergences (BD) [Bregman, 1967] φ : C ⊂ R d → R , strictly convex and of class C 1 then for any ( x , y ) ∈ C × int ( C ) (points of the input space X ), d φ ( x , y ) = φ ( x ) − φ ( y ) − � x − y , ∇ φ ( y ) � 8 φ ( x ) 6 d φ ( x , y ) 4 φ ( y ) + � x − y , ∇ φ ( y ) � 2 φ ( y ) 0 y x − 1 0 1 2 3 Figure – Graphical interpretation of Bregman divergences. 6/21

  7. Exponential families (EF) X is a member of an exponential family E ψ if f ( x | θ ) = h ( x ) exp( � θ, T ( x ) � − ψ ( θ )) , θ ∈ Θ Example : Continuous cases : exponential, normal, gamma, beta... Discrete cases : Bernoulli, poisson, binomial, multinomial... 7/21

  8. Relationship between BD and EF Theorem [Banerjee et al., 2005] If X is a member of an exponential family E ψ and if φ is the convex conjugate of ψ defined by φ ( x ) = sup y {� x , y � − ψ ( y ) } then there exists a unique Bregman divergence d φ such that f ( x | θ ) = h ( x ) exp( − d φ ( T ( x ) , E [ T ( X )]) + φ ( T ( x ))) Example : � � Exponential distribution : d φ ( x , y ) = x x y − log − 1 (Itakura-Saito). y � � x Poisson distribution : d φ ( x , y ) = x log − ( x − y ) (General y Kullback-Leibler). 8/21

  9. Step 1 : K-means algorithm with Bregman divergences Perform K-means algorithm with M options of Bregman divergences. Each BD ℓ gives an associated partition cell S ℓ = { S ℓ k } K k =1 . BD 1 S 1 BD 2 S 2 ... ... BD M S M Step 1 9/21

  10. Step 2 : Construction of candidate estimators k ∈ S ℓ contains enough data points. Suppose that ∀ ℓ, k : S ℓ ∀ ℓ, k : construct an estimator m ℓ k on S ℓ k . m ℓ = { m ℓ k } K k =1 is the candidate estimator associated to DB ℓ . BD 1 S 1 m 1 BD 2 S 2 m 2 ... ... ... BD M S M m M Step 1 Step 2 10/21

  11. Step 3 : Consensual aggregation Why consensual aggregation ? Neither the distribution nor the clustering structure of the input data is available. Not easy to choose the“best”one among { m ℓ } M ℓ =1 . DB 1 S 1 m 1 DB 2 S 2 m 2 Aggregation ... ... ... DB M S M m M Step 3 Step 1 Step 2 11/21

  12. Classification Example : Suppose we have 4 classifiers : m = ( m 1 , m 2 , m 3 , m 4 ) An observation x with predictions : (1 , 1 , 0 , 1). ID m 1 m 2 m 3 m 4 z 1 1 1 0 1 1 2 0 0 0 1 0 3 1 1 0 1 0 4 1 0 1 1 1 5 1 1 0 1 1 Table – Table of predictions. Based on the following works : [Mojirsheibani, 1999] : Classical method (Mo1). [Mojirsheibani, 2000] : A kernel-based method (Mo2). [Fischer and Mougeot, 2019] : MixCOBRA. 12/21

  13. Regression The aggregation takes the following form : n � Agg n ( x ) = W n , i ( x ) z i i =1 [Biau et al., 2016] : with weight 0 − 1 (COBRA). � M ℓ =1 ✶ {| m ℓ ( x i ) − m ℓ ( x ) | <ε } W n , i ( x ) = � n � M ℓ =1 ✶ {| m ℓ ( x j ) − m ℓ ( x ) | <ε } j =1 Kernel-based method of COBRA (kernel-based weight). [Fischer and Mougeot, 2019] : MixCOBRA. 13/21

  14. Applications Bregman divergences Euclidean : For all x ∈ C = R d , φ ( x ) = � x � 2 2 = � d i =1 x 2 i , d φ ( x , y ) = � x − y � 2 2 General Kullback-Leibler (GKL) : φ ( x ) = � d i =1 x i log( x i ), C = (0 , + ∞ ) d , � � � � d φ ( x , y ) = � d x i x i log − ( x i − y i ) i =1 y i i =1 [ x i log( x i ) + (1 − x i ) log(1 − x i )], C = (0 , 1) d , Logistic : φ ( x ) = � d � � � � �� d φ ( x , y ) = � d x i 1 − x i x i log + (1 − x i ) log i =1 y i 1 − y i Itakura-Saito : φ ( x ) = − � d i =1 log( x i ), C = (0 , + ∞ ) d , � � � � d φ ( x , y ) = � d x i x i y i − log − 1 i =1 y i 14/21

  15. Simulated data M = 4 et K = 3. Figure – K-means with Bregman divergences on some simulated data. 15/21

  16. Classification : numerical results With 20 replications of each case. m ℓ K = 1 kernel used in W n , i ( x ) Distribution Single Euclid GKL Logit Ita Unif Epan Gaus Triang Bi-wgt Tri-wgt 3 . 49 3 . 51 3 . 51 3 . 56 3 . 56 3.46 (0 . 89) (0 . 94) (0 . 88) (0 . 94) (0 . 91) (0 . 91) 18 . 86 8 . 58 7 . 42 4 . 09 3.92 Exp (1 . 70) (1 . 77) (1 . 55) (1 . 08) (1 . 15) 2 . 91 2 . 63 2 . 49 2 . 70 2 . 56 2.46 (0 . 81) (0 . 70) (0 . 74) (0 . 75) (0 . 63) (0 . 66) 8 . 59 8.51 8.51 8.51 8 . 52 8 . 52 46 . 93 9 . 19 13 . 33 10.15 (1 . 37) (1 . 46) (1 . 47) (1 . 46) (1 . 47) (1 . 49) 8.45 Pois (3 . 35) (1 . 27) (1 . 24) (1 . 84) (1 . 47) 8 . 51 8 . 46 8 . 44 8.42 8 . 57 8 . 44 (1 . 28) (1 . 11) (1 . 17) (1 . 15) (1 . 28) (1 . 13) 3 . 61 3.60 3.60 3 . 61 3.60 3.60 8.12 (1 . 15) (1 . 16) (1 . 16) (1 . 15) (1 . 16) (1 . 16) 19 . 90 12 . 57 4 . 71 3.94 Geom (2 . 07) (2 . 39) (2 . 37) (1 . 15) (1 . 57) 3 . 76 3 . 52 2.94 3 . 48 3 . 47 3 . 40 (0 . 92) (1 . 11) (0 . 93) (1 . 09) (1 . 11) (1 . 06) 12 . 87 12 . 82 12.80 12 . 84 12 . 84 12 . 87 13.05 (1 . 60) (1 . 59) (1 . 56) (1 . 57) (1 . 57) (1 . 60) 49 . 00 12.37 12 . 40 14 . 14 2D Gaus (2 . 52) (1 . 55) (1 . 50) (1 . 44) (1 . 61) 12.02 12 . 11 12 . 06 12 . 11 12 . 09 12 . 10 (1 . 30) (1 . 24) (1 . 35) (1 . 27) (1 . 23) (1 . 22) 11 . 08 11 . 01 11 . 04 11 . 03 11.00 11.00 (1 . 58) (1 . 52) (1 . 50) (1 . 50) (1 . 57) (1 . 55) 43 . 39 10 . 99 11 . 74 11.56 10.77 3D Gaus (2 . 52) (1 . 40) (1 . 44) (1 . 45) (1 . 51) 10 . 23 9 . 93 10 . 04 9 . 83 9 . 84 9.76 (1 . 40) (1 . 47) (1 . 53) (1 . 47) (1 . 61) (1 . 61) Table – Table of average testing misclassification error (1 unit = 10 − 2 ). 16/21

  17. Regression : numerical results m ℓ K = 1 kernel used in W n , i ( x ) Distribution Single Euclid GKL Logit Ita Unif Epan Gaus Triang Bi-wgt Tri-wgt 55 . 11 51 . 14 40.21 52 . 99 50 . 24 50 . 64 44.46 (15 . 85) (13 . 31) (14 . 40) (13 . 12) (13 . 74) (14 . 41) 107 . 73 69 . 82 58 . 93 44 . 54 Exp (7 . 13) (6 . 84) (7 . 37) (7 . 37) (10 . 96) 56 . 34 52 . 62 39.12 51 . 31 51 . 20 51 . 98 (17 . 48) (17 . 82) (14 . 98) (19 . 55) (19 . 69) (20 . 12) 8 . 88 9 . 18 8.43 8 . 85 8 . 84 8 . 76 12.15 (1 . 65) (1 . 98) (2 . 18) (2 . 06) (2 . 03) (2 . 03) 26 . 76 10 . 16 8.22 16 . 72 Pois (1 . 11) (1 . 91) (2 . 25) (1 . 61) (1 . 86) 9 . 73 9 . 61 9.13 9 . 64 9 . 40 9 . 43 (2 . 25) (1 . 86) (1 . 92) (1 . 91) (1 . 86) (1 . 93) 36 . 39 32 . 49 21.51 31 . 48 31 . 44 30 . 89 (13 . 81) (13 . 49) (11 . 79) (14 . 31) (13 . 51) (12 . 21) 70 . 45 29 . 99 22 . 94 31.94 18.33 Geom (4 . 52) (5 . 95) (7 . 34) (6 . 21) (5 . 19) 31 . 83 27 . 90 26 . 82 28 . 45 24 . 58 17.82 (12 . 88) (14 . 20) (12 . 58) (13 . 28) (14 . 02) (13 . 21) 7 . 09 6 . 57 5.57 6 . 20 6 . 41 6 . 33 9.38 (2 . 55) (1 . 78) (0 . 49) (1 . 72) (1 . 76) (1 . 75) 21 . 98 5.63 6 . 46 19 . 36 2D Gaus (1 . 20) (1 . 26) (1 . 81) (1 . 11) (1 . 86) 9 . 75 7 . 70 6.42 7 . 45 7 . 47 7 . 34 (1 . 30) (2 . 24) (1 . 49) (2 . 42) (2 . 28) (2 . 31) 18 . 16 18 . 20 16.94 18 . 25 18 . 05 18 . 00 22.96 3 . 42) (3 . 45) (4 . 06) (3 . 41) (3 . 50) (3 . 49) 53 . 55 19.89 20 . 93 23 . 71 3D Gaus (1 . 74) (3 . 49) (2 . 97) (2 . 70) (2 . 74) 19 . 24 18 . 52 17.51 18 . 64 18 . 19 18 . 42 (3 . 54) (4 . 02) (3 . 64) (4 . 37) (3 . 91) (3 . 68) Table – Table of average testing RMSE. 17/21

  18. Real data Air compressor Given by [Cadet et al., 2005]. Six predictors : air temperature, input pressure, output pressure, flow and water temperature. Response variable : power consumption. � K is not available ! 18/21

  19. Results of air compressor data For K = 1 : RMSE = 178 . 67. K Euclid GKL Logistic Ita COBRA MixCOBRA ∗ 158 . 85 158 . 90 159 . 35 158 . 96 153 . 34 116.69 2 (6 . 42) (6 . 48) (6 . 71) (6 . 41) (6 . 72) (5 . 86) 157 . 38 157 . 24 156 . 99 157 . 24 153 . 69 117.45 3 (6 . 95) (6 . 84) (6 . 65) (6 . 85) (6 . 64) (5 . 55) 154 . 33 153 . 96 153 . 99 154 . 07 152 . 09 117.16 4 (6 . 69) (6 . 74) (6 . 45) (7 . 01) (6 . 58) (5 . 99) 153 . 18 153 . 19 152 . 95 152 . 25 151 . 05 117.55 5 (6 . 91) (6 . 77) (6 . 57) (6 . 70) (6 . 76) (5 . 90) 151 . 16 151 . 67 151 . 89 151 . 75 150 . 27 117.74 6 (6 . 91) (6 . 96) (6 . 62) (6 . 57) (6 . 82) (5 . 86) 151 . 08 150 . 99 152 . 81 151 . 85 150 . 46 117.58 7 (6 . 77) (6 . 84) (7 . 11) (6 . 61) (6 . 87) (6 . 15) 151 . 27 151 . 09 152 . 07 150 . 90 150 . 21 117.91 8 (7 . 17) (7 . 01) (6 . 65) (6 . 96) (7 . 03) (5 . 83) Table – RMSE of air compressor data. ∗ Consensual aggregation method integrating input X into the weight. [Fischer and Mougeot, 2019]. 19/21

  20. Thank you Question ? 20/21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend