L A M B D A M E A N S C L U S T E R I N G A U T O M A T I C P A R - PowerPoint PPT Presentation

L A M B D A M E A N S C L U S T E R I N G A U T O M A T I C P A R A M E T E R S E A R C H A N D D I S T R I B U T E D C O M P U T I N G I M P L E M E N T A T I O N M A R C U S C O M I T E R , M I R I A M C H A , H T K U N G , S U R A T T E E R A P I T T A Y A N O N H A R V A R D U N I V E R S I T Y I C P R 2 0 1 6 D E C E M B E R 6 , 2 0 1 6

T A L K O U T L I N E • Motivation and Introduction • Background • Lambda Means • Benefits of Lambda Means • Results • Extension to Distributed Framework

M A C H I N E L E A R N I N G : V I S I O N V S . R E A L I T Y

M A C H I N E L E A R N I N G : V I S I O N V S . R E A L I T Y Vision

M A C H I N E L E A R N I N G : V I S I O N V S . R E A L I T Y Vision Reality

C L U S T E R I N G • Clustering is one of the most basic yet most powerful and fundamental of machine learning algorithms • But even in this simple setting, the choice of parameters are both difficult and greatly impact performance

If machine learning is fundamentally a data driven science , shouldn't the use of machine learning itself follow a data driven methodology?

I N T R O D U C T I O N • We present Lambda Means, a meta algorithm for the newly popular clustering algorithm DP-means • Lambda Means automatically finds DP-means' main parameter ( λ ) automatically • It finds λ using the data itself on which the clustering is being performed

D P - M E A N S • DP-means forms clusters of superior quality using a distance parameter λ to ensure minimum separation between cluster centroids rather than specifying k in advance • B. Kulis and M. I. Jordan (the authors of DP-means) show that this new algorithm outperforms the traditional k-means algorithm! • The algorithm forms a new cluster when a data point is found to be more than λ distance away from all existing cluster centroids

D I R I C H L E T P R O C E S S • Under an assumption that a sequence of data is drawn from a Dirichlet Process Mixture Model, B. Kulis and M. I. Jordan (the authors of • μ corresponds to the mean of DP-means) prove that there each of the clusters, drawn exists a lambda value such from some base distribution that when used by DP- G0, which is the prior means, the algorithm will distribution over the means • π =( π 1 , π 2 …) corresponds to discover the ground truth the vector of probabilities of number of clusters k. being in a cluster (k à infinity) • z i is an indicator of cluster assignment • x i is a data point

D P - M E A N S • In practice, without knowing the parameters of the distribution from which the data is drawn, it is unclear how to find the appropriate value of λ for use with DP- means • To solve this problem, a Farthest-first Heuristic requiring a user-provided approximation of k can be used • However, it is not easy to set k • The choice of k has a marked impact on the resulting value of λ

L A M B D A M E A N S • As a solution for automatically finding the λ parameter for use with DP-means, we present Lambda Means • It finds λ using the data itself on which the clustering is being performed • Under an assumption that the data is generated by a Dirichlet Process Mixture Model, we formally prove that the λ value found by Lambda Means is the same λ used in generating the data (see Section III.D in our paper)

L A M B D A M E A N S • The algorithm’s main mechanism is to decrease λ at each iteration, automatically terminating at the proper λ value • This has the effect of precipitating clusters at each iteration up to the point at which all clusters have been identified , but before the point at which true clusters are broken up into individual points

I L L U S T R A T I O N O F E F F E C T O F D E C R E A S I N G λ Itera&on: ¡T ¡ Lambda: ¡Large ¡ A ¡large ¡value ¡of ¡lambda ¡ causes ¡the ¡two ¡sets ¡of ¡ Lambda ¡ points ¡to ¡be ¡clustered ¡ Large ¡ together ¡ Itera&on: ¡T ¡+ ¡ΔT ¡ Lambda: ¡Small ¡ A ¡small ¡value ¡of ¡ Lambda ¡ lambda ¡causes ¡the ¡two ¡ sets ¡of ¡points ¡to ¡be ¡ Small ¡ clustered ¡separately ¡

I L L U S T R A T I O N O F E F F E C T O F D E C R E A S I N G λ

L A M B D A M E A N S • Note that a naive implementation would generate the entire curve and then search for the elbow • Lambda Means replaces the need for this exhaustive search for the elbow of the curve • The algorithm uses the cumulative number of clusters formed as a signaling mechanism, continuing to iterate with smaller values of λ until the stopping criteria is met

B E N E F I T S • Lambda means is more robust then using a Farthest- first Heuristic, which requires a user-defined k • Reason 1: Setting this k can be very difficult • Reason 2: If the initial approximation to k is wrong, it negatively affects finding the correct λ

B E N E F I T S • To show the effect of an incorrect k, we generate a dataset and then use the Farthest- first Heuristic with a number of different values of k to derive λ • We find that λ varies greatly based on the initial k used

B E N E F I T S • The drawbacks of the farthest-first heuristic are clear: • The method is brittle to small changes in the approximation of k • The method has a large impact on the derived value of λ as well as potentially on the resulting cluster quality • In contrast, Lambda Means automatically finds the λ value without an initial approximation for k

R E S U L T S • We provide experimental evaluation of λ -means on both synthetic and real world data • For synthetic data, we generate data with different values of inter-cluster variance variance ρ and the intra-cluster variance variance σ • For real-world data, we use the MNIST hand written digit dataset

R E S U L T S • This figure shows that for synthetic data with a high value of ρ / σ , Lambda Means is able to automatically find the λ value that maximizes AMI and NMI scores • NMI measures the amount of mutual information normalizing for number of clusters, and AMI measures the amount of mutual information accounting for chance • We can also judge Lambda Means by its ability to identify the correct number of clusters, which it does (as shown by the blue line)

R E S U L T S • We now compare the AMI and NMI scores for Lambda Means and DP-means in Table I for additional values of ρ / σ , as well as for the MNIST dataset • Lambda Means outperforms DP-means where λ is set via the Farthest-first heuristic

D I S T R I B U T E D R E S U L T S • Lambda Means easily extends to the distributed framework under the optimistic concurrency control framework • We achieve within a factor of two away from a perfect speed-up in both the multicore and multi-processor distributed settings

T H A N K Y O U M A R C U S C O M I T E R , M I R I A M C H A , H T K U N G , S U R A T T E E R A P I T T A Y A N O N H A R V A R D U N I V E R S I T Y

L A M B D A M E A N S C L U S T E R I N G A U T O M A T I C P A R - PowerPoint PPT Presentation

L A M B D A M E A N S C L U S T E R I N G A U T O M A T I C P A R A M E T E R S E A R C H A N D D I S T R I B U T E D C O M P U T I N G I M P L E M E N T A T I O N M A R C U S C O M I T E R , M I R I A M C H A , H T K U N G , S U R A

The MOF4AIR Project M etal O rganic F rameworks for carbon dioxide A dsorption processes in power

Transparent parallelization of neural network training Cyprien Noel Flickr / Yahoo - GTC 2015

H1 2018 Interim Results and Project Update September 2018 Disclaimer Certain statements within

INVESTOR PRESENTATION 9M 2019 RESULTS AGENDA 9M 2019 Results Our Vision: Commercial Results

Asynchronous K-Means Clustering of Multiple Data Sets Marek Fiser, Illia Ziamtsov, Ariful Azad,

Addressing the Learning Needs of Gifted Students Through the Schoolwide Cluster Grouping Model

Phase Identification of Smart Meters by Clustering Voltage Measurements Frdric OLIVIER

SOA Education Update STUART KLUGMAN Senior Staff Fellow, Education Agenda ASA 2018 VEE

Clustering Patients with Tensor Decomposition Matteo Ruffini 1 a 1 on 2 Ricard Gavald` Esther

Spinoffs and Clustering Russell Golman and Steven Klepper Carnegie Mellon University Department

3D Object Tracking and Localization for AI City Gaoang Wang, Zheng Tang, Jenq-Neng Hwang

A Clustering Scheme for Hierarchical Control in Wireless Networks Suman Banerjee, Samir Khuller

Together Not Apart: Competition, Competitiveness and Clusters By Dr. Kusha Haraksingh Chairman,

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

3D Deep Clustering a clustering framework for unsupervised learning of 3D object feature

UPEM geocoding and clustering methods applied to EUPRO FP3 subdataset Lionel Villard, Michel

Deep Generative Models for Clustering: A Semi-supervised and Unsupervised Approach Jhosimar

Compared to GAMA and Illustris Mara Celeste Artale Instituto de Astronoma y Fsica del

RoboCup Rescue Simulation League CSU_Yunlu From Central South University Participated in Robocup

Come Converge! Lets Talk About Clustering Alanis Chew and Madeline Cope Department of

On the complex network clustering using DryadLINQ Stojan Trajanovski ( st508 ) MPhil in Advanced

Conformal Clustering and its Application to Botnet Traffic Giovanni Cherubin, Ilia Nouretdinov,

Geodemographic Dept. of Geography and Planning Classifications University of Liverpool 23rd GIS

Major Clusters Porterville College January 11th, 2019 -- Flex Day Guided Pathways A Brief