Lecture 23: Spectral clustering Hierarchical clustering What is a - PowerPoint PPT Presentation

Lecture 23: − Spectral clustering − Hierarchical clustering − What is a good clustering? Aykut Erdem May 2016 Hacettepe University

Last time… K-Means • An iterative clustering algorithm - Initialize: Pick K random points as cluster centers (means) - Alternate: Assign data instances • to closest mean Assign each mean to • the average of its assigned points - Stop when no points’ slide by David Sontag assignments change 2

Today • K-means applications • Spectral clustering • Hierarchical clustering • What is a good clustering? 3

K-Means   Example Applications 4

Example: K-Means for Segmentation K=2 Original K=3 K=10 K = 2 Original image K = 3 Goal of Segmentation K = 10 is to partition an image into regions each of which has reasonably homogenous visual appearance. slide by David Sontag 5

Example: K-Means for Segmentation K=2 Original K=3 K=10 K = 2 Original image K = 3 K = 10 slide by David Sontag 6

Example: K-Means for Segmentation K=2 Original K=3 K=10 K = 2 Original image K = 3 K = 10 slide by David Sontag 7

Example: Vector quantization FIGURE 14.9. Sir Ronald A. Fisher ( 1890 − 1962 ) was one of the founders of modern day statistics, to whom we owe maximum-likelihood, su ffi ciency, and many other fundamental concepts. The image on the left is a 1024 × 1024 grayscale image at 8 bits per pixel. The center image is the result of 2 × 2 block VQ, using 200 code vectors, with a compression rate of 1 . 9 bits/pixel. The right image uses only four code vectors, with a compression rate of 0 . 50 bits/pixel slide by David Sontag [Figure from Hastie et al. book] 8

Example: Simple Linear Iterative Clustering (SLIC) superpixels λ : spatial regularization parameter R. Achanta, A. Shaji, K. Smith, A. Lucchi, P . Fua, and S. Susstrunk SLIC Superpixels Compared to State-of-the-art Superpixel Methods, IEEE T-PAMI, 2012 9

Bag of Words model aardvark 0 about 2 all 2 Africa 1 apple 0 anxious 0 ... gas 1 ... oil 1 … slide by Carlos Guestrin Zaire 0 10

11 slide by Fei Fei Li

Object Bag of ‘words’ slide by Fei Fei Li 12

Interest Point Features Compute Normalize SIFT patch descriptor [Lowe’99] Detect patches [Mikojaczyk and Schmid ’02] [Matas et al. ’02] [Sivic et al. ’03] slide by Josef Sivic 13

14 Patch Features … slide by Josef Sivic

Dictionary Formation … slide by Josef Sivic 15

Clustering (usually K-means) … Vector quantization slide by Josef Sivic 16

Clustered Image Patches slide by Fei Fei Li 17

Visual synonyms and polysemy Visual Polysemy. Single visual word occurring on di ff erent (but locally   similar) parts on di ff erent object categories. slide by Andrew Zisserman Visual Synonyms. Two di ff erent visual words representing a similar part of an object (wheel of a motorbike). 18

Image Representation frequency … .. slide by Fei Fei Li 19 codewords

K-Means Clustering: Some Issues • How to set k? • Sensitive to initial centers • Sensitive to outliers • Detects spherical clusters • Assuming means can be computed slide by Kristen Grauman 20

Spectral clustering 21

Graph-Theoretic Clustering Goal: Given data points X 1 , ..., X n and similarities W( X i ,X j ), partition the data into groups so that points in a group are similar and points in di ff erent groups are dissimilar. Similarity Graph: G(V,E,W) V – Vertices (Data points) E – Edge if similarity > 0 W - Edge weights (similarities) Similarity graph slide by Aarti Singh Partition the graph so that edges within a group have large weights and edges across groups have small weights. 22

Graphs Representations a b c d e 0 1 0 0 1 a & # a $ ! b b 1 0 0 0 0 $ ! c 0 0 0 0 1 $ ! c $ ! d 0 0 0 0 1 e $ ! 1 0 1 1 0 $ ! e % " d slide by Bill Freeman and Antonio Torralba Adjacency Matrix 23

A Weighted Graph and its Representation Affinity Matrix 1 . 1 . 3 0 0 & # a b $ ! . 1 1 . 4 0 . 2 $ ! . 3 . 4 1 . 6 . 7 $ ! W = $ ! c 0 0 . 6 1 1 e $ ! 6 0 . 2 . 7 1 1 $ ! % " W : probabilit y that i & j slide by Bill Freeman and Antonio Torralba d ij belong to the same region cluster 24

Similarity graph construction • Similarity Graphs: Model local neighborhood relations between data points • E.g. epsilon-NN Controls size of neighborhood ⇢ k x i � x j k  ✏ 1 W ij = 0 otherwise or mutual k-NN graph (W ij = 1 if x i or x j is k nearest neighbor of the other) slide by Aarti Singh 25

Similarity graph construction • Similarity Graphs: Model local neighborhood relations between data points • E.g. Gaussian kernel similarity function Controls size of neighborhood C slide by Aarti Singh 26

Scale a ff ects a ffi nity • Small σ : group only nearby points • Large σ : group far-away points slide by Svetlana Lazebnik 27

British Machine Vision Conference, pp. 103-108, 1990 W ij = exp(-|| z i – z j || 2 / s 2 ) With an appropriate s W= The eigenvectors of W are: slide by Bill Freeman and Antonio Torralba Three points in feature space The first 2 eigenvectors group the points   as desired…

Example eigenvector points eigenvector Affinity matrix slide by Bill Freeman and Antonio Torralba 29

Example eigenvector points eigenvector Affinity matrix slide by Bill Freeman and Antonio Torralba 30

Graph cut B A • Set of edges whose removal makes a graph disconnected • Cost of a cut: sum of weights of cut edges • A graph cut gives us a partition (clustering) - What is a “good” graph cut and how do we find one? slide by Steven Seitz 31

Minimum cut A cut of a graph G is the set of edges S such • that removal of S from G disconnects G . Cut : sum of the weight of the cut edges: ∑ cut (A,B) = W( u , v ), u ∈ A, v ∈ B with A ∩ B = ∅ slide by Bill Freeman and Antonio Torralba 32

Minimum cut • We can do segmentation by finding the minimum cut in a graph - E ffi cient algorithms exist for doing this Minimum cut example slide by Svetlana Lazebnik 33

Minimum cut • We can do segmentation by finding the minimum cut in a graph - E ffi cient algorithms exist for doing this Minimum cut example slide by Svetlana Lazebnik 34

Drawbacks of Minimum cut Weight of cut is directly proportional to the • number of edges in the cut. Cuts with lesser weight than the slide by Bill Freeman and Antonio Torralba ideal cut Ideal Cut 35 * Slide from Khurram Hassan-Shafique CAP5415 Computer Vision 2003

Normalized cuts Write graph as V, one cluster as A and the other as B cut(A,B) cut(A,B) Ncut(A,B) = + assoc(A,V) assoc(B,V) cut(A,B) is sum of weights with one end in A and one end in B cut (A,B) = ∑ W( u , v ), u ∈ A, v ∈ B with A ∩ B = ∅ slide by Bill Freeman and Antonio Torralba assoc(A,V) is sum of all edges with one end in A. ∑ ssoc (A,B) = W( u , v ) a u ∈ A, v ∈ B A and B not necessarily disjoint J. Shi and J. Malik. Normalized cuts and image segmentation. PAMI 2000 36

      Normalized cut • Let W be the adjacency matrix of the graph • Let D be the diagonal matrix with diagonal entries D ( i, i ) = Σ j W ( i , j ) • Then the normalized cut cost can be written as   T y ( D W ) y − T y Dy where y is an indicator vector whose value should be 1 in the i- th position if the i- th feature point slide by Svetlana Lazebnik belongs to A and a negative constant otherwise J. Shi and J. Malik. Normalized cuts and image segmentation. PAMI 2000 37

Normalized cut • Finding the exact minimum of the normalized cut cost is NP-complete, but if we relax y to take on arbitrary values, then we can minimize the relaxed cost by solving the generalized eigenvalue problem   ( D − W ) y = λDy • The solution y is given by the generalized eigenvector corresponding to the second smallest eigenvalue • Intuitively, the i- th entry of y can be viewed as a “soft” indication of the component membership of the i- th feature slide by Svetlana Lazebnik - Can use 0 or median value of the entries as the splitting point (threshold), or find threshold that minimizes the Ncut cost J. Shi and J. Malik. Normalized cuts and image segmentation. PAMI 2000 38

Normalized cut algorithm slide by Bill Freeman and Antonio Torralba J. Shi and J. Malik. Normalized cuts and image segmentation. PAMI 2000 39

K-Means vs. Spectral Clustering • Applying k-means to Laplacian eigenvectors allows us to find cluster with non-convex boundaries. slide by Aarti Singh Both perform same Spectral clustering is superior 40

K-Means vs. Spectral Clustering • Applying k-means to Laplacian eigenvectors allows us to find cluster with non-convex boundaries. slide by Aarti Singh Spectral clustering output k-means output 41

K-Means vs. Spectral Clustering • Applying k-means to Laplacian eigenvectors allows us to find cluster with non-convex boundaries. Similarity matrix Second eigenvector of graph Laplacian slide by Aarti Singh 42

Examples slide by Aarti Singh [Ng et al., 2001] 43

Lecture 23: Spectral clustering Hierarchical clustering What is a - PowerPoint PPT Presentation

Lecture 23: Spectral clustering Hierarchical clustering What is a good clustering? Aykut Erdem May 2016 Hacettepe University Last time K-Means An iterative clustering algorithm - Initialize: Pick K random points as cluster

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

Pocket Lecture Pocket Lecture Pocket Lecture Pocket Lecture Listen Audio Notes Progress

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

Previous Lecture Todays Lecture Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 30 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 28 Completion of divide-by-3 counter

Previous Lecture Todays Lecture Slides for Lecture 12 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 3 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 2 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 35 ENEL 353: Digital Circuits Fall

Lecture Capture Introduction to Lecture Capture Learning Outcomes What will lecture capture

Previous Lecture Todays Lecture Slides for Lecture 32 Completion of a timing analysis

Repetition Automatic Control, Basic Course, Lecture 11 Fredrik Bagge Carlson December 17, 2016

Previous Lecture Todays Lecture Slides for Lecture 26 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 33 ENEL 353: Digital Circuits Fall

Close Encounters with Non-Normal Worlds Franz Berto F.Berto@uva.nl f.berto@abdn.ac.uk 1 0.

Discrete Mathematics with Applications Chapter 2: The Logic of Compound Statements (Part 2)

Negation Normal Form a formula is in negation normal form (NNF) if it only contains ,

Set Operations B . It is denoted A B . A B = { x | x A x B } A B = { x | x A x B } 1 2 3 4 5 6 7 8

Persistent homotopy types of noisy samples of graphs in the plane Vitaliy Kurlin,

Graphs aphs Workshop on Linked Data on the Web (LDOW 2013) Collocated with the 22nd

Pr r tss

Understanding the Structure of Programs is Difficult Software Clustering Developers create