Consensual Aggregation of Clusters based on Bregman Divergences to - PowerPoint PPT Presentation

Consensual Aggregation of Clusters based on Bregman Divergences to Improve Predictive Models Sothea HAS Sorbonne Universit´ e LPSM, Universit´ e Paris-Diderot Mathilde Mougeot Aur´ elie Fischer sothea.has@lpsm.paris 2 avril 2019 1/21

Overview A. Introduction B. Construction of a predictive model 1. K-means algorithm with Bregman divergences 2. Construction of candidate estimators 3. Consensual aggregation C. Applications 1. Simulated data 2. Real data 2/21

Consider an example... Input data with 3 clusters Different model on each cluster 3/21

Introduction Setting : ( X , Z ) ∈ X × Z : input-out data. X = R d : input space. � R : regression Z = { 0 , 1 } : binary classification T n = { ( x i , z i ) n i =1 } : iid learning data. Objective : Construct a good predictive model for regression or classification. Assumption : X is composed of more than one group or cluster. The number of clusters K is available. There exists different underlying models on these clusters. 4/21

Construction of a predictive model There are 3 important steps : 1. K-means algorithm with Bregman divergences 2. Construction of candidate estimators 3. Consensual aggregation 5/21

Bregman divergences (BD) [Bregman, 1967] φ : C ⊂ R d → R , strictly convex and of class C 1 then for any ( x , y ) ∈ C × int ( C ) (points of the input space X ), d φ ( x , y ) = φ ( x ) − φ ( y ) − � x − y , ∇ φ ( y ) � 8 φ ( x ) 6 d φ ( x , y ) 4 φ ( y ) + � x − y , ∇ φ ( y ) � 2 φ ( y ) 0 y x − 1 0 1 2 3 Figure – Graphical interpretation of Bregman divergences. 6/21

Exponential families (EF) X is a member of an exponential family E ψ if f ( x | θ ) = h ( x ) exp( � θ, T ( x ) � − ψ ( θ )) , θ ∈ Θ Example : Continuous cases : exponential, normal, gamma, beta... Discrete cases : Bernoulli, poisson, binomial, multinomial... 7/21

Relationship between BD and EF Theorem [Banerjee et al., 2005] If X is a member of an exponential family E ψ and if φ is the convex conjugate of ψ defined by φ ( x ) = sup y {� x , y � − ψ ( y ) } then there exists a unique Bregman divergence d φ such that f ( x | θ ) = h ( x ) exp( − d φ ( T ( x ) , E [ T ( X )]) + φ ( T ( x ))) Example : � � Exponential distribution : d φ ( x , y ) = x x y − log − 1 (Itakura-Saito). y � � x Poisson distribution : d φ ( x , y ) = x log − ( x − y ) (General y Kullback-Leibler). 8/21

Step 1 : K-means algorithm with Bregman divergences Perform K-means algorithm with M options of Bregman divergences. Each BD ℓ gives an associated partition cell S ℓ = { S ℓ k } K k =1 . BD 1 S 1 BD 2 S 2 ... ... BD M S M Step 1 9/21

Step 2 : Construction of candidate estimators k ∈ S ℓ contains enough data points. Suppose that ∀ ℓ, k : S ℓ ∀ ℓ, k : construct an estimator m ℓ k on S ℓ k . m ℓ = { m ℓ k } K k =1 is the candidate estimator associated to DB ℓ . BD 1 S 1 m 1 BD 2 S 2 m 2 ... ... ... BD M S M m M Step 1 Step 2 10/21

Step 3 : Consensual aggregation Why consensual aggregation ? Neither the distribution nor the clustering structure of the input data is available. Not easy to choose the“best”one among { m ℓ } M ℓ =1 . DB 1 S 1 m 1 DB 2 S 2 m 2 Aggregation ... ... ... DB M S M m M Step 3 Step 1 Step 2 11/21

Classification Example : Suppose we have 4 classifiers : m = ( m 1 , m 2 , m 3 , m 4 ) An observation x with predictions : (1 , 1 , 0 , 1). ID m 1 m 2 m 3 m 4 z 1 1 1 0 1 1 2 0 0 0 1 0 3 1 1 0 1 0 4 1 0 1 1 1 5 1 1 0 1 1 Table – Table of predictions. Based on the following works : [Mojirsheibani, 1999] : Classical method (Mo1). [Mojirsheibani, 2000] : A kernel-based method (Mo2). [Fischer and Mougeot, 2019] : MixCOBRA. 12/21

Regression The aggregation takes the following form : n � Agg n ( x ) = W n , i ( x ) z i i =1 [Biau et al., 2016] : with weight 0 − 1 (COBRA). � M ℓ =1 ✶ {| m ℓ ( x i ) − m ℓ ( x ) | <ε } W n , i ( x ) = � n � M ℓ =1 ✶ {| m ℓ ( x j ) − m ℓ ( x ) | <ε } j =1 Kernel-based method of COBRA (kernel-based weight). [Fischer and Mougeot, 2019] : MixCOBRA. 13/21

Applications Bregman divergences Euclidean : For all x ∈ C = R d , φ ( x ) = � x � 2 2 = � d i =1 x 2 i , d φ ( x , y ) = � x − y � 2 2 General Kullback-Leibler (GKL) : φ ( x ) = � d i =1 x i log( x i ), C = (0 , + ∞ ) d , � � � � d φ ( x , y ) = � d x i x i log − ( x i − y i ) i =1 y i i =1 [ x i log( x i ) + (1 − x i ) log(1 − x i )], C = (0 , 1) d , Logistic : φ ( x ) = � d � � � � �� d φ ( x , y ) = � d x i 1 − x i x i log + (1 − x i ) log i =1 y i 1 − y i Itakura-Saito : φ ( x ) = − � d i =1 log( x i ), C = (0 , + ∞ ) d , � � � � d φ ( x , y ) = � d x i x i y i − log − 1 i =1 y i 14/21

Simulated data M = 4 et K = 3. Figure – K-means with Bregman divergences on some simulated data. 15/21

Classification : numerical results With 20 replications of each case. m ℓ K = 1 kernel used in W n , i ( x ) Distribution Single Euclid GKL Logit Ita Unif Epan Gaus Triang Bi-wgt Tri-wgt 3 . 49 3 . 51 3 . 51 3 . 56 3 . 56 3.46 (0 . 89) (0 . 94) (0 . 88) (0 . 94) (0 . 91) (0 . 91) 18 . 86 8 . 58 7 . 42 4 . 09 3.92 Exp (1 . 70) (1 . 77) (1 . 55) (1 . 08) (1 . 15) 2 . 91 2 . 63 2 . 49 2 . 70 2 . 56 2.46 (0 . 81) (0 . 70) (0 . 74) (0 . 75) (0 . 63) (0 . 66) 8 . 59 8.51 8.51 8.51 8 . 52 8 . 52 46 . 93 9 . 19 13 . 33 10.15 (1 . 37) (1 . 46) (1 . 47) (1 . 46) (1 . 47) (1 . 49) 8.45 Pois (3 . 35) (1 . 27) (1 . 24) (1 . 84) (1 . 47) 8 . 51 8 . 46 8 . 44 8.42 8 . 57 8 . 44 (1 . 28) (1 . 11) (1 . 17) (1 . 15) (1 . 28) (1 . 13) 3 . 61 3.60 3.60 3 . 61 3.60 3.60 8.12 (1 . 15) (1 . 16) (1 . 16) (1 . 15) (1 . 16) (1 . 16) 19 . 90 12 . 57 4 . 71 3.94 Geom (2 . 07) (2 . 39) (2 . 37) (1 . 15) (1 . 57) 3 . 76 3 . 52 2.94 3 . 48 3 . 47 3 . 40 (0 . 92) (1 . 11) (0 . 93) (1 . 09) (1 . 11) (1 . 06) 12 . 87 12 . 82 12.80 12 . 84 12 . 84 12 . 87 13.05 (1 . 60) (1 . 59) (1 . 56) (1 . 57) (1 . 57) (1 . 60) 49 . 00 12.37 12 . 40 14 . 14 2D Gaus (2 . 52) (1 . 55) (1 . 50) (1 . 44) (1 . 61) 12.02 12 . 11 12 . 06 12 . 11 12 . 09 12 . 10 (1 . 30) (1 . 24) (1 . 35) (1 . 27) (1 . 23) (1 . 22) 11 . 08 11 . 01 11 . 04 11 . 03 11.00 11.00 (1 . 58) (1 . 52) (1 . 50) (1 . 50) (1 . 57) (1 . 55) 43 . 39 10 . 99 11 . 74 11.56 10.77 3D Gaus (2 . 52) (1 . 40) (1 . 44) (1 . 45) (1 . 51) 10 . 23 9 . 93 10 . 04 9 . 83 9 . 84 9.76 (1 . 40) (1 . 47) (1 . 53) (1 . 47) (1 . 61) (1 . 61) Table – Table of average testing misclassification error (1 unit = 10 − 2 ). 16/21

Regression : numerical results m ℓ K = 1 kernel used in W n , i ( x ) Distribution Single Euclid GKL Logit Ita Unif Epan Gaus Triang Bi-wgt Tri-wgt 55 . 11 51 . 14 40.21 52 . 99 50 . 24 50 . 64 44.46 (15 . 85) (13 . 31) (14 . 40) (13 . 12) (13 . 74) (14 . 41) 107 . 73 69 . 82 58 . 93 44 . 54 Exp (7 . 13) (6 . 84) (7 . 37) (7 . 37) (10 . 96) 56 . 34 52 . 62 39.12 51 . 31 51 . 20 51 . 98 (17 . 48) (17 . 82) (14 . 98) (19 . 55) (19 . 69) (20 . 12) 8 . 88 9 . 18 8.43 8 . 85 8 . 84 8 . 76 12.15 (1 . 65) (1 . 98) (2 . 18) (2 . 06) (2 . 03) (2 . 03) 26 . 76 10 . 16 8.22 16 . 72 Pois (1 . 11) (1 . 91) (2 . 25) (1 . 61) (1 . 86) 9 . 73 9 . 61 9.13 9 . 64 9 . 40 9 . 43 (2 . 25) (1 . 86) (1 . 92) (1 . 91) (1 . 86) (1 . 93) 36 . 39 32 . 49 21.51 31 . 48 31 . 44 30 . 89 (13 . 81) (13 . 49) (11 . 79) (14 . 31) (13 . 51) (12 . 21) 70 . 45 29 . 99 22 . 94 31.94 18.33 Geom (4 . 52) (5 . 95) (7 . 34) (6 . 21) (5 . 19) 31 . 83 27 . 90 26 . 82 28 . 45 24 . 58 17.82 (12 . 88) (14 . 20) (12 . 58) (13 . 28) (14 . 02) (13 . 21) 7 . 09 6 . 57 5.57 6 . 20 6 . 41 6 . 33 9.38 (2 . 55) (1 . 78) (0 . 49) (1 . 72) (1 . 76) (1 . 75) 21 . 98 5.63 6 . 46 19 . 36 2D Gaus (1 . 20) (1 . 26) (1 . 81) (1 . 11) (1 . 86) 9 . 75 7 . 70 6.42 7 . 45 7 . 47 7 . 34 (1 . 30) (2 . 24) (1 . 49) (2 . 42) (2 . 28) (2 . 31) 18 . 16 18 . 20 16.94 18 . 25 18 . 05 18 . 00 22.96 3 . 42) (3 . 45) (4 . 06) (3 . 41) (3 . 50) (3 . 49) 53 . 55 19.89 20 . 93 23 . 71 3D Gaus (1 . 74) (3 . 49) (2 . 97) (2 . 70) (2 . 74) 19 . 24 18 . 52 17.51 18 . 64 18 . 19 18 . 42 (3 . 54) (4 . 02) (3 . 64) (4 . 37) (3 . 91) (3 . 68) Table – Table of average testing RMSE. 17/21

Real data Air compressor Given by [Cadet et al., 2005]. Six predictors : air temperature, input pressure, output pressure, flow and water temperature. Response variable : power consumption. � K is not available ! 18/21

Results of air compressor data For K = 1 : RMSE = 178 . 67. K Euclid GKL Logistic Ita COBRA MixCOBRA ∗ 158 . 85 158 . 90 159 . 35 158 . 96 153 . 34 116.69 2 (6 . 42) (6 . 48) (6 . 71) (6 . 41) (6 . 72) (5 . 86) 157 . 38 157 . 24 156 . 99 157 . 24 153 . 69 117.45 3 (6 . 95) (6 . 84) (6 . 65) (6 . 85) (6 . 64) (5 . 55) 154 . 33 153 . 96 153 . 99 154 . 07 152 . 09 117.16 4 (6 . 69) (6 . 74) (6 . 45) (7 . 01) (6 . 58) (5 . 99) 153 . 18 153 . 19 152 . 95 152 . 25 151 . 05 117.55 5 (6 . 91) (6 . 77) (6 . 57) (6 . 70) (6 . 76) (5 . 90) 151 . 16 151 . 67 151 . 89 151 . 75 150 . 27 117.74 6 (6 . 91) (6 . 96) (6 . 62) (6 . 57) (6 . 82) (5 . 86) 151 . 08 150 . 99 152 . 81 151 . 85 150 . 46 117.58 7 (6 . 77) (6 . 84) (7 . 11) (6 . 61) (6 . 87) (6 . 15) 151 . 27 151 . 09 152 . 07 150 . 90 150 . 21 117.91 8 (7 . 17) (7 . 01) (6 . 65) (6 . 96) (7 . 03) (5 . 83) Table – RMSE of air compressor data. ∗ Consensual aggregation method integrating input X into the weight. [Fischer and Mougeot, 2019]. 19/21

Thank you Question ? 20/21

Consensual Aggregation of Clusters based on Bregman Divergences to - PowerPoint PPT Presentation

Consensual Aggregation of Clusters based on Bregman Divergences to Improve Predictive Models Sothea HAS Sorbonne Universit e LPSM, Universit e Paris-Diderot Mathilde Mougeot Aur elie Fischer sothea.has@lpsm.paris 2 avril 2019 1/21

CONSENSUAL NOT POLITICAL 1 0 S E P T E M B E R 2 0 1 7 CONSENSUAL NOT POLITICAL 1 0 S E P T

MELODI M achin E L earning, O ptimization, & D ata I nterpretation @ UW Iyer & Bilmes,

Bregman and Wasserstein, with Applications to Generative Adversarial Networks (GANs) and beyond

A Bregman near neighbor lower bound via directed isoperimetry Amirali Abdullah Suresh

I nternational research The evidence on clusters is clear Firms located in clusters are more

Internet Server Clusters Internet Server Clusters Jeff Chase Duke University, Department of

Elmwood Park: Electricity Aggregation Developing an Opt-In Municipal Aggregation Program to

simplifying the customer experience through account aggregation Sim Sangha Business Development

The Axiomatic Method in Social Choice Theory: Preference Aggregation, Judgment Aggregation, Graph

Class- -based Traffic Aggregation In Optical Packet based Traffic Aggregation In Optical Packet

Consensual Resolutions of Distressed Loans Evaluating Modifications, Forbearance Agreements, Deeds

Sharing the dream The consensual hallucination offered by the Bundle Protocol Lloyd Wood, Daniel

SPE meets DevOps: best friends or consensual enemies? Catia Trubiani Gran Sasso Science

Locational narratives in creative clusters An exploration of place, reputation and creative

Dynamic Virtual Clusters in a Grid Dynamic Virtual Clusters in a Grid Site Manager Site Manager

Aggregation semantics Jzsef Marton Budapest University of Technology and Economics 2017-05-10,

Aggregation Model for English Derivational Morphology Daniel Deutsch,* John Hewitt,* and Dan Roth

Multirobot autonomous landmine detection aggregation Simulation using distributed multisensor

Municipal Aggregation Update Village of Kenilworth May 16, 2012 & May 21, 2012 Outline

Using the Two-Stage Approach to Price Index Aggregation Marcel van Kints Australian Bureau of

Where have we been? Where are we now? Where are we going? Revitalizing the Special Education

Together for Kids Coalition Update for Massachusetts Department of Early Education and Care

Local Gas Distribution Companies Presented to: The Washington State Citizens Committee on

NFPA 1221 2016 Edition Pending Changes NFPA S Standa dard d (7. 7.4. 4.2) Emer ergen

Consensual Aggregation of Clusters based on Bregman Divergences to - PowerPoint PPT Presentation

Consensual Aggregation of Clusters based on Bregman Divergences to Improve Predictive Models Sothea HAS Sorbonne Universit e LPSM, Universit e Paris-Diderot Mathilde Mougeot Aur elie Fischer sothea.has@lpsm.paris 2 avril 2019 1/21

CONSENSUAL NOT POLITICAL 1 0 S E P T E M B E R 2 0 1 7 CONSENSUAL NOT POLITICAL 1 0 S E P T

MELODI M achin E L earning, O ptimization, &amp; D ata I nterpretation @ UW Iyer &amp; Bilmes,

Bregman and Wasserstein, with Applications to Generative Adversarial Networks (GANs) and beyond

A Bregman near neighbor lower bound via directed isoperimetry Amirali Abdullah Suresh

I nternational research The evidence on clusters is clear Firms located in clusters are more

Internet Server Clusters Internet Server Clusters Jeff Chase Duke University, Department of

Elmwood Park: Electricity Aggregation Developing an Opt-In Municipal Aggregation Program to

simplifying the customer experience through account aggregation Sim Sangha Business Development

The Axiomatic Method in Social Choice Theory: Preference Aggregation, Judgment Aggregation, Graph

Class- -based Traffic Aggregation In Optical Packet based Traffic Aggregation In Optical Packet

Consensual Resolutions of Distressed Loans Evaluating Modifications, Forbearance Agreements, Deeds

Sharing the dream The consensual hallucination offered by the Bundle Protocol Lloyd Wood, Daniel

SPE meets DevOps: best friends or consensual enemies? Catia Trubiani Gran Sasso Science

Locational narratives in creative clusters An exploration of place, reputation and creative

Dynamic Virtual Clusters in a Grid Dynamic Virtual Clusters in a Grid Site Manager Site Manager

Aggregation semantics Jzsef Marton Budapest University of Technology and Economics 2017-05-10,

Aggregation Model for English Derivational Morphology Daniel Deutsch,* John Hewitt,* and Dan Roth

Multirobot autonomous landmine detection aggregation Simulation using distributed multisensor

Municipal Aggregation Update Village of Kenilworth May 16, 2012 &amp; May 21, 2012 Outline

Using the Two-Stage Approach to Price Index Aggregation Marcel van Kints Australian Bureau of

Where have we been? Where are we now? Where are we going? Revitalizing the Special Education

Together for Kids Coalition Update for Massachusetts Department of Early Education and Care

Local Gas Distribution Companies Presented to: The Washington State Citizens Committee on

NFPA 1221 2016 Edition Pending Changes NFPA S Standa dard d (7. 7.4. 4.2) Emer ergen

MELODI M achin E L earning, O ptimization, & D ata I nterpretation @ UW Iyer & Bilmes,

Municipal Aggregation Update Village of Kenilworth May 16, 2012 & May 21, 2012 Outline