Faster Algorithms for the Constrained k -means Problem Ragesh - PowerPoint PPT Presentation

Faster Algorithms for the Constrained k -means Problem Ragesh Jaiswal CSE, IIT Delhi June 16, 2015 [Joint work with Anup Bhattacharya (IITD) and Amit Kumar (IITD)] Ragesh Jaiswal Faster Algorithms for the Constrained k -means Problem

k -means Clustering Problem Problem ( k -means) Given n points X ⊂ R d , and an integer k, find k points C ⊂ R d (called centers ) such that the sum of squared Euclidean distance of each point in X to its closest center in C is minimized. That is, the following cost function is minimized: � || x − c || 2 � � Φ C ( X ) = min c ∈ C x ∈ X Example: k = 4 , d = 2 Ragesh Jaiswal Faster Algorithms for the Constrained k -means Problem

k -means Lower/Upper Bounds Lower bounds: The problem is NP-hard when k ≥ 2 , d ≥ 2 [Das08, MNV12, Vat09]. Theorem [ACKS15]: There is a constant ǫ > 0 such that it is NP-hard to approximate the k -means problem to a factor better than (1 + ǫ ). Ragesh Jaiswal Faster Algorithms for the Constrained k -means Problem

k -means Lower/Upper Bounds Lower bounds: The problem is NP-hard when k ≥ 2 , d ≥ 2 [Das08, MNV12, Vat09]. Theorem [ACKS15]: There is a constant ǫ > 0 such that it is NP-hard to approximate the k -means problem to a factor better than (1 + ǫ ). Upper bounds: There are various approximation algorithms for the k -means problem. Citation Approx. factor Running Time [AV07] O (log k ) polynomial time [KMN + 02] 9 + ǫ polynomial time � nd · 2 ˜ O ( k /ǫ ) � [KSS10, JKY15, FMS07] (1 + ǫ ) O Ragesh Jaiswal Faster Algorithms for the Constrained k -means Problem

k -means Locality property Clustering using the k -means formulation implicitly assumes that the target clustering follows locality property that data points within the same cluster are close to each other in some geometric sense. There are clustering problems arising in Machine Learning where locality is not the only requirement while clustering. Ragesh Jaiswal Faster Algorithms for the Constrained k -means Problem

k -means Locality property Clustering using the k -means formulation implicitly assumes that the target clustering follows locality property that data points within the same cluster are close to each other in some geometric sense. There are clustering problems arising in Machine Learning where locality is not the only requirement while clustering. r-gather clustering : Each cluster should contain at least r points. Capacitated clustering : Cluster size is upper bounded. l-diversity clustering : Each input point has an associated color and each cluster should not have more that 1 l fraction of its points sharing the same color. Chromatic clustering : Each input point has an associated color and points with same color should be in different clusters. Ragesh Jaiswal Faster Algorithms for the Constrained k -means Problem

k -means Locality property Clustering using the k -means formulation implicitly assumes that the target clustering follows locality property that data points within the same cluster are close to each other in some geometric sense. There are clustering problems arising in Machine Learning where locality is not the only requirement while clustering. r-gather clustering : Each cluster should contain at least r points. Capacitated clustering : Cluster size is upper bounded. l-diversity clustering : Each input point has an associated color and each cluster should not have more that 1 l fraction of its points sharing the same color. Chromatic clustering : Each input point has an associated color and points with same color should be in different clusters. A unified framework that considers all the above problems would be nice. Ragesh Jaiswal Faster Algorithms for the Constrained k -means Problem

k -means Locality property There are clustering problems arising in Machine Learning where locality is not the only requirement while clustering. r-gather clustering : Each cluster should contain at least r points. Capacitated clustering : Cluster size is upper bounded. l-diversity clustering : Each input point has an associated color and each cluster should not have more that 1 l fraction of its points sharing the same color. Chromatic clustering : Each input point has an associated color and points with same color should be in different clusters. A unified framework that considers all the above problems would be nice. Problem (Constrained k -means [DX15]) Given n points X ⊂ R d , an integer k, and a set of constraints D , find k clusters X 1 , ..., X k such that (i) the clusters satisfy D and (ii) the following cost function is minimized: k � x ∈ X i x � � || x − Γ( X i ) || 2 , where Γ( X i ) = Ψ( X ) = . | X i | i =1 x ∈ X i Ragesh Jaiswal Faster Algorithms for the Constrained k -means Problem

Constrained k -means Problem ( k -means) Given n points X ⊂ R d , and an integer k, find k centers C ⊂ R d such that the the following cost function is minimized: � || x − c || 2 � � Φ C ( X ) = min c ∈ C x ∈ X Problem (Constrained k -means [DX15]) Given n points X ⊂ R d , an integer k, and a set of constraints D , find k clusters X 1 , ..., X k such that (i) the clusters satisfy D and (ii) the following cost function is minimized: k � x ∈ X i x � � || x − Γ( X i ) || 2 , where Γ( X i ) = Ψ( X ) = . | X i | i =1 x ∈ X i Ragesh Jaiswal Faster Algorithms for the Constrained k -means Problem

Constrained k -means Problem ( k -means) Given n points X ⊂ R d , and an integer k, find k clusters X 1 , ..., X k such that the the following cost function is minimized: k � x ∈ X i x � � || x − Γ( X i ) || 2 , where Γ( X i ) = Φ( X ) = . | X i | i =1 x ∈ X i Problem (Constrained k -means [DX15]) Given n points X ⊂ R d , an integer k, and a set of constraints D , find k clusters X 1 , ..., X k such that (i) the clusters satisfy D and (ii) the following cost function is minimized: k � x ∈ X i x � � || x − Γ( X i ) || 2 , where Γ( X i ) = Ψ( X ) = . | X i | i =1 x ∈ X i Fact For any X ⊂ R d and any point p ∈ R d , x ∈ X || x − p || 2 = � x ∈ X || x − Γ( X ) || 2 + | X | · || Γ( X ) − p || 2 . � Ragesh Jaiswal Faster Algorithms for the Constrained k -means Problem

Constrained k -means Problem ( k -means) Given n points X ⊂ R d , and an integer k, find k centers C ⊂ R d such that the the following cost function is minimized: � || x − c || 2 � � Φ C ( X ) = min c ∈ C x ∈ X Problem (Constrained k -means [DX15]) Given n points X ⊂ R d , an integer k, and a set of constraints D , find k clusters X 1 , ..., X k such that (i) the clusters satisfy D and (ii) the following cost function is minimized: k � x ∈ X i x � � || x − Γ( X i ) || 2 , where Γ( X i ) = Ψ( X ) = . | X i | i =1 x ∈ X i Ragesh Jaiswal Faster Algorithms for the Constrained k -means Problem

Constrained k -means Problem ( k -means) Given n points X ⊂ R d , and an integer k, find k centers C ⊂ R d such that the the following cost function is minimized: � || x − c || 2 � � Φ C ( X ) = min c ∈ C x ∈ X Problem (Attempted formulation in terms of centers) Given n points X ⊂ R d , an integer k, and a set of constraints D , find k centers C ⊂ R d such that... Ragesh Jaiswal Faster Algorithms for the Constrained k -means Problem

Constrained k -means Problem ( k -means) Given n points X ⊂ R d , and an integer k, find k centers C ⊂ R d such that the the following cost function is minimized: � || x − c || 2 � Φ C ( X ) = min � c ∈ C x ∈ X Problem (Constrained k -means [DX15]) Given n points X ⊂ R d , an integer k, a set of constraints D , and a partition algorithm A D , find k centers C ⊂ R d such that the following cost function is minimized: k � � || x − Γ( X i ) || 2 , where ( X 1 , ..., X k ) ← A D ( C , X ) . Ψ( X ) = i =1 x ∈ X i Partition Algorithm [DX15] Given a dataset X , constraints D , and centers C = ( c 1 , ..., c k ), the partition algorithm A D ( C , X ) outputs a clustering ( X 1 , ..., X k ) of X such that (i) all clusters X i satisfy D and (ii) the following cost function is minimized: k � � || x − c i || 2 . cost ( A D ( C , X )) = i =1 x ∈ X i Ragesh Jaiswal Faster Algorithms for the Constrained k -means Problem

Faster Algorithms for the Constrained k -means Problem Ragesh - PowerPoint PPT Presentation

Faster Algorithms for the Constrained k -means Problem Ragesh Jaiswal CSE, IIT Delhi June 16, 2015 [Joint work with Anup Bhattacharya (IITD) and Amit Kumar (IITD)] Ragesh Jaiswal Faster Algorithms for the Constrained k -means Problem k -means

FASTER TRANSFORMER Bo Yang Hsueh, 2019/12/18 AGENDA What is Faster Transformer Introduce the

Algorithms for constrained local optimization Fabio Schoen 2008

Implementing Existing Management Protocols on Constrained Devices J urgen Sch onw alder

K-MEANS++ OPTIMAL INITIALIZATION ALGORITHM An Improved K-means Clustering Method OVERVIEW

Lecture 23/Chapter 19 Diversity of Sample Means Means versus Proportions Behavior of

Problem Definition Problem Definition Problem Definition Problem Definition Problem Definition

Water Rights Accounting New Accounting Model New Technology: 1979 versus 2011 Faster

Faster Cover Trees Mike Izbicki and Christian R. Shelton UC Riverside Izbicki and Shelton (UC

Faster Johnson-Lindenstrauss style reductions Aditya Menon August 23, 2007 Faster

WRITING FASTER CODE 1 . 1 WRITING FASTER CODE AND NOT HATING YOUR JOB AS A SOFTWARE DEVELOPER

Faster Code Nicolas Limare 2014/11/19 faster? one task vs many speeds one operation vs many

Texture Synthesis Presented by James Hays Problem Statement 1 Problem Statement Problem

Graph Algorithms Chapter 22 1 CPTR 430 Algorithms Graph Algorithms Why Study Graph Algorithms?

Greedy Algorithms Chapter 16 1 CPTR 430 Algorithms Greedy Algorithms Greedy Algorithms For

Algorithms Chapter 3 Chapter Summary Algorithms n Example Algorithms n Algorithmic Paradigms

Asymptotically faster quantum algorithms to solve multivariate quadratic equations Daniel J.

Tow ards a European Certification of Informatics Curricula Enrico Nardelli Univ. Roma Tor

Concurrency Theory Winter Semester 2019/20 Lecture 7: Modelling and Analysing Mutual Exclusion

Data Data- -Centric Query in Sensor Networks Centric Query in Sensor Networks Jie Gao Jie Gao

Using Standards to Cost- Effectively Manage Risk Georgia Logistics Summit Atlanta, Georgia May

Simplified Benders cuts for Facility Location Matteo Fischetti, University of Padova based on

Known Consignor Regime 1 Restricted Scope of Presentation Overview of Air Cargo Industry

A Hybrid Model of Adaptive Video Streaming Control Systems G. Cofano, Luca De Cicco , S. Mascolo

Lecture 1 Wireless Channel I-Hsiang Wang ihwang@ntu.edu.tw 2/20, 2014 Wireless