CS777 Presentation - RREM Archit Sharma and Amur Ghose Analysis of - PowerPoint PPT Presentation

CS777 Presentation - RREM Archit Sharma and Amur Ghose Analysis of the EM algorithm April 28, 2018

Introduction - EM’s general properties Expectation Maximization is a pervasively used algorithm in probabilistic ML for doing point estimation in presence of latent variables. The monotonicity guarantee comes from its minorization-maximization nature. However, EM suffers from the general intialization troubles of non-convex problems. ◮ Convergence proof for Generalized EM (GEM) has been know since Wu (1983), but only in case of an unimodal distributions ◮ Balakrishnan et al. (2014) show convergence to local maxima under much more general conditions ◮ Not much work on convergence of EM under noisy data Most “Robust EM” work is just robustness to initialization. We will qualitatively look at the hardness of doing EM, and also look at how to construct EM algorithms robust to noise (possibly adversarial).

Adversarial EM - a difficult task To demonstrate how difficult adversarial EM is, we consider the case of recovering a mixture of Gaussians in one dimension. Assume that the generating Gaussians have zero variance (the support of the distribution is still R ). Suppose, we generate N points generated (with an unknown apriori distribution). We hand them over to an adversary who can change m points, and we wish to recover at least L clusters accurately. Since the variance is zero, recovering a cluster is equivalent to finding its centroid.

A simple algorithm ◮ Sort clusters by frequency. ◮ Choose top K clusters.

The case of L=K Consider doing the above for L = K , and suppose that cluster i was generated with c i points. If the smallest cluster had size c ′ , the adversary would need c ′ / 2 (in the worst case where our algorithm chooses the adversarially generated cluster) points to throw off our clustering by at least one cluster, and we would not be able to do anything about it. This already is very prohibitive.

The Algorithm for Adversary in Zero-Variance Case Given the algorithm is based on sorting clusters based on frequency, the adversary would like to maximize the number of generated clusters in the top K sorted clusters of the algorithm. ◮ Sort clusters by frequency. Binary search for the number of clusters we can corrupt with the allocated budget. ◮ Choose the smallest K / 2 clusters in the sorted array. If the minimum number of points to “overpower” them is ≤ m , search between K / 2 + 1 and K , else 1 and K / 2 − 1. Repeat till convergence. ◮ A “histogram cutting algorithm” for determining the minimum number of points to required by the adversary to “overpower” smallest k algorithms.

Problem Formulation Given a data matrix X ∈ R N × d , with ≤ η N adversarially corrupted points, we want to create K clusters by using Gaussian Mixture Modelling. In particular, we are interested in recovering the cluster means as accurately as possible. We will proceed to obtain them using maximum likelihood maximization, with some modification to the EM algorithm to make them robust.

Methods to boost EM - Likelihood Weighing/Thresholding General Intuition : Outliers will have low likelihoods. ◮ In addition to posterior probability weight for every data point in the likelihood function, we additionally added likelihood based weighing of the data points. The weights are some function of the likelihood, such as the sigmoid of the likelihood. ◮ Somewhat better than this method is likelihood thresholding. In every M -step we allow the EM algorithm to discard up to η N number of points based on the likelihood so far, after a “warm start”-like phase where we calculate our model based on all points at once.

Some rough analogies between robust EM and other optimization ◮ Heuristic EM methods that weigh points by their likelihoods are common. They usually lack proofs of convergence but perform well in practice. Can be see as the equivalent of IRLS ◮ Similarly, any method of excluding outliers can be seen as an analogue to thresholding, such as IHT

Our chosen method - Ridiculously Robust EM General Intuition : Treat the set of outliers as an extra cluster of point. In this method, we have an extra cluster, with a uniform probability likelihood over any points in its support. For example, this could be the uniform distribution over a large ball B . This is meant to trap outliers and we will output only the “real” K clusters.

Solutions Index the extra cluster by 0. The expected complete likelihood to be maximized is N K � � � � L ( π, µ, σ ) = E [ z ij = 1] log p ( x i | µ j , σ j ) + log p ( z ij = 1) i =1 j =0 (1) Here, log p ( x i | µ 0 , σ 0 ) = 0 for all x i . Hence, the only additional parameter to be learnt is π 0 . This will fail, unless we introduce some constraint on π 0 . A natural constraint is π 0 ≤ η (in addition to � K j =0 π j = 1).

Empirical results Our metric for comparison is the sorted L 2 loss between vectors. That is, given two vectors a , b we define a ′ as the vector with indices of a permuted, and b ′ as the same for b . It is trivial to show that if we seek to minimize || a ′ − b ′ || Then a ′ is the sorted permutation of a and b ′ the sorted permutation (by the rearrangement inequality). This metric is a natural one to check the convergence since the centroids may be recovered correctly but come out permuted (in one dimension).

L2 loss values for different EM varieties Our framework uses K = 3 GMMs with means of 1 , 10 , − 5. For all clusters we use the same σ . There is a modest improvement with likelihood thresholding in terms of Gaussian noise of unit variance, that grows rapidly as we choose η = 0 . 05 fraction of points at random and corrupt them with uniform noise in a high range, such as [16 , 17]. The robust algorithm is seen to discard these points correctly. Table 1: L2 results on 3-GMM Algo Zero mean Uniform noise Non-zero-mean Gaussian EM 6.78 18.17 14.72 Threshold 3.84 16.52 11.68 RREM 7.45 6.822 8.87 The mean for the Gaussian used above was 4.

CS777 Presentation - RREM Archit Sharma and Amur Ghose Analysis of - PowerPoint PPT Presentation

CS777 Presentation - RREM Archit Sharma and Amur Ghose Analysis of the EM algorithm April 28, 2018 Introduction - EMs general properties Expectation Maximization is a pervasively used algorithm in probabilistic ML for doing point estimation

developed to measure societal progress Rutger Hoekstra (Statistics Netherlands) Introduction:

Convergence with EU Policies Chisinau, 6-8 June 2017 GEORGETA MINCU, Member of WG2 SCF Centre

What Political Legitimacy for a (Scientific Director EURO-CEFG) Casper de Vries Erasmus

Rethinking public policies for employment promotion EGM on Employment UN-DESA & ILO 23-24

Markham 2020: Success By Design Focus on Priority Sectors Presentation to Development Services

Convex Optimization ( EE227A: UC Berkeley ) Lecture 18 (Proximal methods; Incremental methods

While Loops Announcements for This Lecture Assignments Prelim 2 Prelim, Nov 21 st at 7:30

SURVEILLANCE SENSOR Cristina SANTANA CONSTELLATIONS Date 16/03/2016 INTRODUCTION SOMMAIRE

Manifold-Adaptive Dimension Estimation Amir massoud Farahmand (1) , Csaba Szepesvri (1) ,

Supervised Principal Component Regression for Functional Data with High Dimensional Predictors

` STANDARDS (IFRS) CONVERGENCE AND THE IMPLEMENTATION OF CORPORATE GOVERNANCE ON INTEGRITY OF

Sustainability Program Update and 2019 Plan Update Citizen Oversight Panel November 29, 2018

GE 2016 first quarter performance Financial results & Company highlights April 22, 2016

Pro-employment macroeconomic frameworks, sectoral strategies for employment creation and the

l'allocation de l'aide? Facing the crisis, which priorities in aid allocation? Introduction by

Presentation of the Gorenje Group Business Performance 9M 2017 Investor Conference Austria

EU ( c , c ) U ( c ) ( 1 ) U ( c ) 1 2 1 2 c

renewAfrica 2 nd meeting of the signatories January 28 th 2020 Advancing EU Commitment to deliver

multibillion-dollar greenfield LNG projects American Chamber of Commerce in Vietnam Ho Chi Minh

Artis Real Estate Investment Trust Investor Presentation Q4 2016 PROPERTIES OF SUCCESS 1

Investor Presentation Q1 2019 Forward Looking Statements This document contains

Ascendas India Trust 1Q FY2019 Financial Results Presentation 25 July 2019 Disclaimer This

A Theoretical Analysis of Curvature Based Preference Models Pradyumn Shukla 1 Michael Emmerich 2

Information Geometry in Mathematical Finance: Model Risk, Worst and Almost Worst Scenarios Imre

Sambuz

Useful Links

Newsletter

Mail Us

CS777 Presentation - RREM Archit Sharma and Amur Ghose Analysis of - PowerPoint PPT Presentation

CS777 Presentation - RREM Archit Sharma and Amur Ghose Analysis of the EM algorithm April 28, 2018 Introduction - EMs general properties Expectation Maximization is a pervasively used algorithm in probabilistic ML for doing point estimation

developed to measure societal progress Rutger Hoekstra (Statistics Netherlands) Introduction:

Convergence with EU Policies Chisinau, 6-8 June 2017 GEORGETA MINCU, Member of WG2 SCF Centre

What Political Legitimacy for a (Scientific Director EURO-CEFG) Casper de Vries Erasmus

Rethinking public policies for employment promotion EGM on Employment UN-DESA &amp; ILO 23-24

Markham 2020: Success By Design Focus on Priority Sectors Presentation to Development Services

Convex Optimization ( EE227A: UC Berkeley ) Lecture 18 (Proximal methods; Incremental methods

While Loops Announcements for This Lecture Assignments Prelim 2 Prelim, Nov 21 st at 7:30

SURVEILLANCE SENSOR Cristina SANTANA CONSTELLATIONS Date 16/03/2016 INTRODUCTION SOMMAIRE

Manifold-Adaptive Dimension Estimation Amir massoud Farahmand (1) , Csaba Szepesvri (1) ,

Supervised Principal Component Regression for Functional Data with High Dimensional Predictors

` STANDARDS (IFRS) CONVERGENCE AND THE IMPLEMENTATION OF CORPORATE GOVERNANCE ON INTEGRITY OF

Sustainability Program Update and 2019 Plan Update Citizen Oversight Panel November 29, 2018

GE 2016 first quarter performance Financial results &amp; Company highlights April 22, 2016

Pro-employment macroeconomic frameworks, sectoral strategies for employment creation and the

l'allocation de l'aide? Facing the crisis, which priorities in aid allocation? Introduction by

Presentation of the Gorenje Group Business Performance 9M 2017 Investor Conference Austria

EU ( c , c ) U ( c ) ( 1 ) U ( c ) 1 2 1 2 c

renewAfrica 2 nd meeting of the signatories January 28 th 2020 Advancing EU Commitment to deliver

multibillion-dollar greenfield LNG projects American Chamber of Commerce in Vietnam Ho Chi Minh

Artis Real Estate Investment Trust Investor Presentation Q4 2016 PROPERTIES OF SUCCESS 1

Investor Presentation Q1 2019 Forward Looking Statements This document contains

Ascendas India Trust 1Q FY2019 Financial Results Presentation 25 July 2019 Disclaimer This

A Theoretical Analysis of Curvature Based Preference Models Pradyumn Shukla 1 Michael Emmerich 2

Information Geometry in Mathematical Finance: Model Risk, Worst and Almost Worst Scenarios Imre

Sambuz

Useful Links

Newsletter

Mail Us

Rethinking public policies for employment promotion EGM on Employment UN-DESA & ILO 23-24

GE 2016 first quarter performance Financial results & Company highlights April 22, 2016