Come Converge! Lets Talk About Clustering Alanis Chew and Madeline - - PowerPoint PPT Presentation

come converge let s talk about clustering
SMART_READER_LITE
LIVE PREVIEW

Come Converge! Lets Talk About Clustering Alanis Chew and Madeline - - PowerPoint PPT Presentation

What is Clustering? Clustering Techniques Youngstown State University Results Acknowledgments Come Converge! Lets Talk About Clustering Alanis Chew and Madeline Cope Department of Mathematics and Statistics Youngstown State University


slide-1
SLIDE 1

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Come Converge! Let’s Talk About Clustering

Alanis Chew and Madeline Cope

Department of Mathematics and Statistics Youngstown State University

26th January 2019

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-2
SLIDE 2

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Outline

What is Clustering? Clustering Techniques Mutual Nearest Neighbor Spectral Clustering Results Acknowledgments

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-3
SLIDE 3

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

What is clustering?

  • Data analysis, predicting behaviors

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-4
SLIDE 4

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

What is clustering?

  • Data analysis, predicting behaviors
  • A form of unsupervised classification

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-5
SLIDE 5

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

What is clustering?

  • Data analysis, predicting behaviors
  • A form of unsupervised classification
  • Group data into meaningful clusters

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-6
SLIDE 6

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

What is clustering?

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-7
SLIDE 7

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Mutual Nearest Neighborhood

  • The Mutual Nearest Neighbor clustering algorithm is a

hierarchical and agglomerative approach

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-8
SLIDE 8

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Mutual Nearest Neighborhood

  • The Mutual Nearest Neighbor clustering algorithm is a

hierarchical and agglomerative approach

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-9
SLIDE 9

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Mutual Nearest Neighborhood

  • The Mutual Nearest Neighbor clustering algorithm is a

hierarchical and agglomerative approach

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-10
SLIDE 10

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Mutual Nearest Neighborhood

  • The Mutual Nearest Neighbor clustering algorithm is a

hierarchical and agglomerative approach

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-11
SLIDE 11

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Mutual Nearest Neighborhood

  • The Mutual Nearest Neighbor clustering algorithm is a

hierarchical and agglomerative approach

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-12
SLIDE 12

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

MNN Flowchart

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-13
SLIDE 13

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

MNN Flowchart

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-14
SLIDE 14

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Mutual Nearest Neighborhood

  • 1. Create a distance matrix, D

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-15
SLIDE 15

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

MNN Flowchart

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-16
SLIDE 16

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Mutual Nearest Neighborhood

  • 2. Create the nearest neighbor matrix, M1

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-17
SLIDE 17

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

MNN Flowchart

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-18
SLIDE 18

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Mutual Nearest Neighborhood

  • 3. Create the mutual neighborhood value matrix, M2

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-19
SLIDE 19

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

MNN Flowchart

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-20
SLIDE 20

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Mutual Nearest Neighborhood

  • 4. Merge all points with a MNV of 2

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-21
SLIDE 21

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Mutual Nearest Neighborhood

  • 4. Merge all points with a MNV of 2
  • 5. Continue merging clusters until the desired number of clusters

is reached

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-22
SLIDE 22

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

MNN Flowchart

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-23
SLIDE 23

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Spectral Clustering

  • Partitional and graph theoretic

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-24
SLIDE 24

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Spectral Clustering

  • Partitional and graph theoretic
  • Represent data as a graph where data points are vertices and

edge weights are the similarities between them

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-25
SLIDE 25

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Spectral Clustering

  • Partitional and graph theoretic
  • Represent data as a graph where data points are vertices and

edge weights are the similarities between them

  • Uses eigenvalues to perform dimensionality reduction

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-26
SLIDE 26

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Spectral Clustering: Graphs

A graph contains a vertex set, an edge set, and a relation that associates each edge with two vertices

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-27
SLIDE 27

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Spectral Clustering: Graphs

A graph contains a vertex set, an edge set, and a relation that associates each edge with two vertices

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-28
SLIDE 28

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Spectral Clustering: Graphs

A graph contains a vertex set, an edge set, and a relation that associates each edge with two vertices

  • The vertex set is

{A,B,C,D,E}

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-29
SLIDE 29

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Spectral Clustering: Graphs

A graph contains a vertex set, an edge set, and a relation that associates each edge with two vertices

  • The vertex set is

{A,B,C,D,E}

  • The edge set is

{{A,C},{A,B},{A,E},{B,D}, {B,E}}

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-30
SLIDE 30

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Spectral Clustering: Graphs

A graph contains a vertex set, an edge set, and a relation that associates each edge with two vertices

  • The vertex set is

{A,B,C,D,E}

  • The edge set is

{{A,C},{A,B},{A,E},{B,D}, {B,E}}

  • For example, A and C share

a relation

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-31
SLIDE 31

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Spectral Clustering

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-32
SLIDE 32

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Spectral Clustering

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-33
SLIDE 33

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Adjacency matrix A: Aij =

  • wij

if connected

  • therwise

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-34
SLIDE 34

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Adjacency matrix A: Aij =

  • wij

if connected

  • therwise

A = A B C D E A 1 6 B 1 4 3 1 C 6 4 1 D 3 1 1 E 1 1

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-35
SLIDE 35

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Spectral Clustering

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-36
SLIDE 36

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Spectral Clustering: Degree Matrix

Degree Matrix D: Dii =

  • j

Aij A = A B C D E A 1 6 B 1 4 3 1 C 6 4 1 D 3 1 1 E 1 1 D = A B C D E A 7 B 9 C 11 D 5 E 2

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-37
SLIDE 37

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Spectral Clustering

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-38
SLIDE 38

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Spectral Clustering: Laplacian Matrix

Laplacian Matrix L: L = D − A

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-39
SLIDE 39

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Spectral Clustering: Laplacian Matrix

Laplacian Matrix L: L = D − A L = A B C D E A 7

  • 1
  • 6

B

  • 1

9

  • 4
  • 3
  • 1

C

  • 6
  • 4

11

  • 1

D

  • 3
  • 1

5

  • 1

E

  • 1
  • 1

2

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-40
SLIDE 40

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Spectral Clustering

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-41
SLIDE 41

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Spectral Clustering: Principal Component Analysis

1 Make Correlation Matrix

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-42
SLIDE 42

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Spectral Clustering: Principal Component Analysis

1 Make Correlation Matrix 2 Find eigenvalues and eigenvectors

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-43
SLIDE 43

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Spectral Clustering: Principal Component Analysis

1 Make Correlation Matrix 2 Find eigenvalues and eigenvectors 3 Make Projection Matrix

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-44
SLIDE 44

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Spectral Clustering

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-45
SLIDE 45

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Last Step

  • k−means: finds the center and measures distance of all points

relative to center

1 Select k points at random as cluster centers 2 Assign objects to their closest center using Euclidean distance 3 Calculate mean of all objects in each cluster

  • M =

k

  • j=1

n

  • i=1

||x(j)

i

− cj||

2

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-46
SLIDE 46

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Spectral Clustering

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-47
SLIDE 47

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Defaulting on Credit Card Bills

  • Obtained from the University of California at Irvine’s Machine

Learning Repository

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-48
SLIDE 48

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Defaulting on Credit Card Bills

  • Obtained from the University of California at Irvine’s Machine

Learning Repository

  • Contains details of individuals who have either defaulted or

not on their credit card bill

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-49
SLIDE 49

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Defaulting on Credit Card Bills

  • Obtained from the University of California at Irvine’s Machine

Learning Repository

  • Contains details of individuals who have either defaulted or

not on their credit card bill

  • A few details included are:
  • Gender
  • Age
  • Highest level of education completed

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-50
SLIDE 50

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Defaulting on Credit Card Bills

  • Obtained from the University of California at Irvine’s Machine

Learning Repository

  • Contains details of individuals who have either defaulted or

not on their credit card bill

  • A few details included are:
  • Gender
  • Age
  • Highest level of education completed

Question Posed: Can we predict whether or not an individual will default on their credit card bill based on the given attributes?

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-51
SLIDE 51

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Spectral Clustering Algorithm

  • Our spectral clustering algorithm was applied to the credit

card defaulting data set

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-52
SLIDE 52

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Spectral Clustering Algorithm

  • Our spectral clustering algorithm was applied to the credit

card defaulting data set

  • We tried to run the entire data set through the program...

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-53
SLIDE 53

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-54
SLIDE 54

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Spectral Clustering Algorithm

  • So instead we ran samples of the data through the program!

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-55
SLIDE 55

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Spectral Clustering Algorithm

  • So instead we ran samples of the data through the program!
  • First 100, then 1,000, and then 2,500 data points

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-56
SLIDE 56

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

First 100 Points of Defaulting Data Set

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-57
SLIDE 57

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

First 100 Points of Defaulting Data Set

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-58
SLIDE 58

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

First 100 Points of Defaulting Data Set

The program correctly clustered 47.0% of this portion of the data set

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-59
SLIDE 59

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

First 1,000 Points of Defaulting Data Set

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-60
SLIDE 60

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

First 1,000 Points of Defaulting Data Set

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-61
SLIDE 61

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

First 1,000 Points of Defaulting Data Set

The program correctly clustered 73.8% of this portion of the data set

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-62
SLIDE 62

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

First 2,500 Points of Defaulting Data Set

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-63
SLIDE 63

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

First 2,500 Points of Defaulting Data Set

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-64
SLIDE 64

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

First 2,500 Points of Defaulting Data Set

The program correctly clustered 65.2% of this portion of the data set

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-65
SLIDE 65

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Acknowledgements

  • Dr. Alicia Prieto-Langarica
  • Dr. George Yates
  • Dr. Thomas Wakefield
  • Dr. Barbara Faires
  • Eric Quayson
  • CURMath
  • Youngstown State University
  • 2019 Nebraska Conference for Undergraduate Women in

Mathematics

  • YSU Student Government Association
  • YSU Department of Mathematics and Statistics

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-66
SLIDE 66

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

References

  • Radu Horaud, A Short Tutorial on Graph Laplacians,

Laplacian Embedding, and Spectral Clustering

  • Jure Leskovec, Mining of Massive Datasets, Analysis of Large

Graphs

  • Vasileios Zografos and Klas Nordberg, Introduction to

Spectral Clustering

  • Ulrike von Luxburg, A Tutorial on Spectral Clustering
  • D. Niu, J. Dy, M. Jordan, Dimensionality Reduction for

Spectral Clustering

  • Department of Computer and Information Science, University
  • f Pennsylvania, Chapter 17: Graphs and Graph Laplacians
  • A. Saxena, M. Prasad, A. Gupta, N. Bharill, O. Patel, A.

Tiwari, M. Er, W. Ding, C. Lin, A Review of Clustering Techniques and Developments

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering

slide-67
SLIDE 67

What is Clustering? Clustering Techniques Results Acknowledgments

Youngstown State University

Come Converge! Let’s Talk About Clustering

Alanis Chew and Madeline Cope

Department of Mathematics and Statistics Youngstown State University

26th January 2019

Chew, Cope 26th January 2019 Come Converge! Let’s Talk About Clustering