Basics of k-means clustering CLUS TERIN G METH ODS W ITH S CIP Y - PowerPoint PPT Presentation

Basics of k-means clustering CLUS TERIN G METH ODS W ITH S CIP Y Shaumik Daityari Business Analyst

Why k-means clustering? A critical drawback of hierarchical clustering: runtime K means runs signi�cantly faster on large datasets CLUSTERING METHODS WITH SCIPY

Step 1: Generate cluster centers kmeans(obs, k_or_guess, iter, thresh, check_finite) obs : standardized observations k_or_guess : number of clusters iter : number of iterations (default: 20) thres : threshold (default: 1e-05) check_finite : whether to check if observations contain only �nite numbers (default: True) Returns two objects: cluster centers, distortion CLUSTERING METHODS WITH SCIPY

How is distortion calculated? CLUSTERING METHODS WITH SCIPY

Step 2: Generate cluster labels vq(obs, code_book, check_finite=True) obs : standardized observations code_book : cluster centers check_finite : whether to check if observations contain only �nite numbers (default: True) Returns two objects: a list of cluster labels, a list of distortions CLUSTERING METHODS WITH SCIPY

A note on distortions kmeans returns a single value of distortions vq returns a list of distortions. CLUSTERING METHODS WITH SCIPY

Running k-means # Import kmeans and vq functions from scipy.cluster.vq import kmeans, vq # Generate cluster centers and labels cluster_centers, _ = kmeans(df[['scaled_x', 'scaled_y']], 3) df['cluster_labels'], _ = vq(df[['scaled_x', 'scaled_y']], cluster_centers) # Plot clusters sns.scatterplot(x='scaled_x', y='scaled_y', hue='cluster_labels', data=df) plt.show() CLUSTERING METHODS WITH SCIPY

CLUSTERING METHODS WITH SCIPY

Next up: exercises! CLUS TERIN G METH ODS W ITH S CIP Y

How many clusters? CLUS TERIN G METH ODS W ITH S CIP Y Shaumik Daityari Business Analyst

How to �nd the right k? No absolute method to �nd right number of clusters (k) in k-means clustering Elbow method CLUSTERING METHODS WITH SCIPY

Distortions revisited Distortion: sum of squared distances of points from cluster centers Decreases with an increasing number of clusters Becomes zero when the number of clusters equals the number of points Elbow plot: line plot between cluster centers and distortion CLUSTERING METHODS WITH SCIPY

Elbow method Elbow plot: plot of the number of clusters and distortion Elbow plot helps indicate number of clusters present in data CLUSTERING METHODS WITH SCIPY

Elbow method in Python # Declaring variables for use distortions = [] num_clusters = range(2, 7) # Populating distortions for various clusters for i in num_clusters: centroids, distortion = kmeans(df[['scaled_x', 'scaled_y']], i) distortions.append(distortion) # Plotting elbow plot data elbow_plot_data = pd.DataFrame({'num_clusters': num_clusters, 'distortions': distortions}) sns.lineplot(x='num_clusters', y='distortions', data = elbow_plot_data) plt.show() CLUSTERING METHODS WITH SCIPY

CLUSTERING METHODS WITH SCIPY

Final thoughts on using the elbow method Only gives an indication of optimal k (numbers of clusters) Does not always pinpoint how many k (numbers of clusters) Other methods: average silhouette and gap statistic CLUSTERING METHODS WITH SCIPY

Next up: exercises CLUS TERIN G METH ODS W ITH S CIP Y

Limitations of k- means clustering CLUS TERIN G METH ODS W ITH S CIP Y Shaumik Daityari Business Analyst

Limitations of k-means clustering How to �nd the right K (number of clusters)? Impact of seeds Biased towards equal sized clusters CLUSTERING METHODS WITH SCIPY

Impact of seeds Initialize a random seed Seed: np.array(1000, 2000) Cluster sizes: 29, 29, 43, 47, 52 from numpy import random random.seed(12) Seed: np.array(1,2,3) Cluster sizes: 26, 31, 40, 50, 53 CLUSTERING METHODS WITH SCIPY

Impact of seeds: plots Seed: np.array(1000, 2000) Seed: np.array(1,2,3) CLUSTERING METHODS WITH SCIPY

Uniform clusters in k means CLUSTERING METHODS WITH SCIPY

Uniform clusters in k-means: a comparison K-means clustering with 3 clusters Hierarchical clustering with 3 clusters CLUSTERING METHODS WITH SCIPY

Final thoughts Each technique has its pros and cons Consider your data size and patterns before deciding on algorithm Clustering is exploratory phase of analysis CLUSTERING METHODS WITH SCIPY

Next up: exercises CLUS TERIN G METH ODS W ITH S CIP Y

Basics of k-means clustering CLUS TERIN G METH ODS W ITH S CIP Y - PowerPoint PPT Presentation

Basics of k-means clustering CLUS TERIN G METH ODS W ITH S CIP Y Shaumik Daityari Business Analyst Why k-means clustering? A critical drawback of hierarchical clustering: runtime K means runs signicantly faster on large datasets

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

K-MEANS++ OPTIMAL INITIALIZATION ALGORITHM An Improved K-means Clustering Method OVERVIEW

k -means clustering Method to automatically separate data sets into distinct groups. Clustering

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Multi-variable Optimization K-means clustering K-means clustering on points is finding K

Data Clustering: Data Clustering: 50 Years Beyond K means 50 Years Beyond K means 50 Years

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

Clustering kMeans, Expectation Maximization, Self-Organizing Maps Outline K-means

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

1 K-means clustering The K-means clustering algorithm can be seen as applying the EM algorithm to

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

CSCE 478/878 Lecture 8: Stephen Scott Clustering Introduction Outline Clustering Stephen

Lecture 23: Spectral clustering Hierarchical clustering What is a good clustering?

Clustering Reference:http://www.elet.polimi.it/upload/matteucc/Clustering/tutorial_html/ Dr Ahmed

Presenta entatio tion n Outli line ne What t do you want t to achieve ieve in life?

DEVELOPMENT ECONOMICS Online course 2020 Instructor : Swarna Sadasivam Vepa LECTURE TWO

Chapter 1 Information on ECE master program Dr. Mohamed Mahmoud

Perceptions and Knowledge of Mental Illness in the Perceptions and Knowledge of Mental Illness in

Generating Image Distortion Maps Using Convolutional Autoencoders with Application to No

Robust cartogram visualization of outliers in manifold leaning Alessandra Tosi and Alfredo

ISODISTORT Tutorial Exercises Branton J. Campbell and Harold T. Stokes, Dept. of Physics &

Bounded Distortion Mapping and Shape Deformation

Basics of k-means clustering CLUS TERIN G METH ODS W ITH S CIP Y - PowerPoint PPT Presentation

Basics of k-means clustering CLUS TERIN G METH ODS W ITH S CIP Y Shaumik Daityari Business Analyst Why k-means clustering? A critical drawback of hierarchical clustering: runtime K means runs signicantly faster on large datasets

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

K-MEANS++ OPTIMAL INITIALIZATION ALGORITHM An Improved K-means Clustering Method OVERVIEW

k -means clustering Method to automatically separate data sets into distinct groups. Clustering

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Multi-variable Optimization K-means clustering K-means clustering on points is finding K

Data Clustering: Data Clustering: 50 Years Beyond K means 50 Years Beyond K means 50 Years

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

Clustering kMeans, Expectation Maximization, Self-Organizing Maps Outline K-means

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

1 K-means clustering The K-means clustering algorithm can be seen as applying the EM algorithm to

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

CSCE 478/878 Lecture 8: Stephen Scott Clustering Introduction Outline Clustering Stephen

Lecture 23: Spectral clustering Hierarchical clustering What is a good clustering?

Clustering Reference:http://www.elet.polimi.it/upload/matteucc/Clustering/tutorial_html/ Dr Ahmed

Presenta entatio tion n Outli line ne What t do you want t to achieve ieve in life?

DEVELOPMENT ECONOMICS Online course 2020 Instructor : Swarna Sadasivam Vepa LECTURE TWO

Chapter 1 Information on ECE master program Dr. Mohamed Mahmoud

Perceptions and Knowledge of Mental Illness in the Perceptions and Knowledge of Mental Illness in

Generating Image Distortion Maps Using Convolutional Autoencoders with Application to No

Robust cartogram visualization of outliers in manifold leaning Alessandra Tosi and Alfredo

ISODISTORT Tutorial Exercises Branton J. Campbell and Harold T. Stokes, Dept. of Physics &amp;

Bounded Distortion Mapping and Shape Deformation

ISODISTORT Tutorial Exercises Branton J. Campbell and Harold T. Stokes, Dept. of Physics &