Linear Manifold Clustering Robert Haralick and Rave Harpaz Outline - PowerPoint PPT Presentation

Linear Manifold Clustering Robert Haralick and Rave Harpaz

Outline Background The linear manifold cluster model The Linear manifold clustering algorithm Linear manifold modeling Linear manifold subspace correlation clustering Conclusion

Background Clustering is the process of classifying a collection patterns, into classes called clusters so that the patterns within a cluster are “similar” to one another, yet “dissimilar” to patterns in other clusters. Each clustering technique makes implicit assumptions The shape of the clusters The similarity criteria The grouping technique

Cluster Models database 2 hyper-spherical hyper-ellipsoidal arbitrary shaped linear nonlinear

K-Means Hyper-Spherical Clusters Choose K points at random to be cluster centers Assign each point to its closest cluster center Make the new cluster centers be the cluster means Iterate

K-Means Clusters

Subspace Clustering Definition Subspace clustering produces clusters which are compact on a subset of dimensions aligned with the coordinate axes and not compact on the orthogonal complement of those dimensions. z z x x y full space subspace (x-z projection) Subspace clustering handles High dimensional data Irrelevant features

Pattern and Correlation Clustering 1 2 3 4 5 6 7 8 parallel coordinate view Object similarity is no longer measured by physical distance, but by the behavior patterns objects manifest or the magnitude of correlations they induce. Problem Statement: Identify groups of points that exhibit coherent behavior patterns across a subset of the measurement features.

Pattern and Correlation Clustering - Applications 1 2 3 4 5 6 7 8 Gene expression micro-array analysis - identify groups of genes that exhibit similar expression patterns under some subset of conditions, from which gene function or regulatory mechanisms may be inferred. Collaborative filtering/recommendation systems - sets of customers/users with similar interest patterns need to be identified so that future interests can be predicted and proper recommendations be made. Dimensionality reduction by correlation Finance - identify groups of stocks that show similar price fluctuations under a certain time period.

Linear Manifold Clusters Definition L is a linear manifold of vector space V if and only if for some subspace S of V and translation t ∈ V , L = { x ∈ V | for some s ∈ S , x = t + s } . The dimension of L is the dimension of S , and if the dimension of L is one less than the dimension of V then L is called a hyperplane.

Linear Manifold Clusters Definition L is a linear manifold of vector space V if and only if for some subspace S of V and translation t ∈ V , L = { x ∈ V | for some s ∈ S , x = t + s } . The dimension of L is the dimension of S , and if the dimension of L is one less than the dimension of V then L is called a hyperplane. A linear manifold is, in other words, a subspace that may have been shifted away from the origin. A subspace is a linear manifold that contains the origin.

Dense Linear Manifold Clusters C3 200 150 C2 C1 100 100 50 50 0 0 0 50 100 150

The Linear Manifold Cluster Model The linear manifold cluster model has the following properties: The points in each cluster are embedded in a lower dimensional linear manifold.

The Linear Manifold Cluster Model The linear manifold cluster model has the following properties: The points in each cluster are embedded in a lower dimensional linear manifold. The intrinsic dimensionality of the cluster is the dimensionality of the linear manifold.

The Linear Manifold Cluster Model The linear manifold cluster model has the following properties: The points in each cluster are embedded in a lower dimensional linear manifold. The intrinsic dimensionality of the cluster is the dimensionality of the linear manifold. The manifold is arbitrarily oriented.

The Linear Manifold Cluster Model The linear manifold cluster model has the following properties: The points in each cluster are embedded in a lower dimensional linear manifold. The intrinsic dimensionality of the cluster is the dimensionality of the linear manifold. The manifold is arbitrarily oriented. The points in the cluster induce a correlation among two or more attributes (or linear combinations of attributes) of the data.

The Linear Manifold Cluster Model The linear manifold cluster model has the following properties: The points in each cluster are embedded in a lower dimensional linear manifold. The intrinsic dimensionality of the cluster is the dimensionality of the linear manifold. The manifold is arbitrarily oriented. The points in the cluster induce a correlation among two or more attributes (or linear combinations of attributes) of the data. In the orthogonal complement space to the manifold the points form a compact densely populated region, which can be used to cluster the data.

The Linear Manifold Cluster Model Comment Classical clustering algorithms such as K-means assume that each cluster is associated with a zero dimensional manifold (the center) and therefore omit the possibility that a cluster may have non-zero dimensional linear manifold associated with it.

The Range Space of a Matrix Suppose B is a matrix. . . .   . . . . . .   B = b 1 b 2 · · · b N    . . .  . . . . . . and x is a vector   x 1 x 2   x =  .  .   .   x N Let y = Bx .

The Range Space of a Matrix y = Bx   x 1 . . .   . . . . . . x 2     = b 1 b 2 · · · b N  .    .   . . . .   . . .   . . . x N N � y = x n b n n = 1 y is a linear combination of the columns of B .

The Linear Manifold Cluster Model Each point x in a k - D linear manifold cluster is modeled by: x = µ + B φ + B ǫ x : d × 1 random vector µ : d × 1 translation vector in R d b 3 b 2 B : d × k matrix b 1 ′ B = 0 B : d × d − k matrix, B µ φ : k × 1 random vector ∼ U ( − R , R ) ǫ : d − k × 1 random vector ∼ N ( 0 , Σ) | Σ | is small

Linear Manifold Cluster Model x = µ + B φ + B ǫ E [ x ] = E [ µ + B φ + B ǫ ] = E [ µ ] + E [ B φ ] + E [ B ǫ ] = µ + BE [ φ ] + BE [ ǫ ] = µ

Orthogonal Projection Definition Let V be a vector space and W be any subspace of V . Represent vector v ∈ V as v = w + w ⊥ where w ∈ W and w ⊥ ∈ W ⊥ . Then w is called the orthogonal projection of v onto W and w ⊥ is the orthogonal projection of v onto W ⊥ . Theorem Let V be a vector space and W be any subspace of V. Let B be a matrix whose columns constitute an orthonormal basis of W. Let v ∈ V satisfy v = w + w ⊥ where w ∈ W and w ⊥ ∈ W ⊥ . Then ′ v w = BB

Singular Value Decomposition Definition The Singular Value Decomposition of a real matrix X N × K is the factoring of X as ′ K × K X N × K U N × N Λ N × K V = where ′ UU = I ′ = VV I Λ = rectangular diagonal

Thin Singular Value Decomposition Definition The Thin Singular Value Decomposition of a real matrix X N × K , K < N is the factoring of X as ′ K × K X N × K U N × K Λ K × K = V K K where ′ I K × K = U K U K ′ I K × K VV = Λ K = diagonal

Orthonormal Basis of Subspace Theorem Let X N × K have columns which span a K-dimensional subspace W. Let the thin singular value decomposition of X be ′ K × K X N × K U N × K Λ K × K = V K K Then ′ U K U K X = X Proof. ′ K × N ′ K × N ′ K × K ) U N × K X N × K U N × K ( U N × K Λ K × K U = U V K K K K K K ′ K × N ′ K × K U N × K U N × K )Λ K × K = ( U V K K K K ′ K × K U N × K Λ K × K = V K K = X

Distance To Linear Manifold Theorem Let a linear manifold L be represented by L = { z | z = µ + B φ } where µ is a vector that translates the origin to the manifold and the columns of B are orthonormal. Then the Euclidean distance of x to L is given by ′ )( x − µ ) � ρ ( x , L ) = � ( I − BB Proof. BB ′ is the orthogonal projection operator to the subspace spanned by the columns of B ′ is the orthogonal projection operator to the orthogonal complement of I − BB the subspace spanned by the columns of B ′ )( x − µ ) is the projection of x to the orthogonal complement of the linear ( I − BB manifold L ′ )( x − µ ) � is the distance of x to the manifold L � ( I − BB

Distance To Linear Manifold Proposition Let B be a matrix whose columns are orthonormal. Then � ′ ) y � � y � 2 − � B ′ y � 2 � ( I − BB = Proof. ′ ) y � 2 ′ y � 2 � ( I − BB = � y − BB ′ y ) ′ ( y − BB ′ y ) = ( y − BB ′ y − 2 y ′ BB ′ y + y ′ ( BB ′ )( BB ′ ) y = y ′ y − 2 y ′ BB ′ y + y ′ ( B ( B ′ B ) B ′ ) y = y ′ y − 2 y ′ BB ′ y + y ′ ( BB ′ ) y = y ′ y − y ′ BB ′ y = y � y � 2 − � B ′ y � 2 =

The Linear Manifold Clustering Algorithm C3 200 150 C2 C1 100 100 50 50 0 0 0 50 100 150 Outline - stochastic model fitting technique

The Linear Manifold Clustering Algorithm C3 200 150 C2 C1 100 100 50 50 0 0 0 50 100 150 Outline - stochastic model fitting technique Sample trial linear manifolds of various dimensions. 1

Linear Manifold Clustering Robert Haralick and Rave Harpaz Outline - PowerPoint PPT Presentation

Linear Manifold Clustering Robert Haralick and Rave Harpaz Outline Background The linear manifold cluster model The Linear manifold clustering algorithm Linear manifold modeling Linear manifold subspace correlation clustering Conclusion

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Manifold Learning: Applications in Neuroimaging Robin Wolz 23/09/2011 Overview Manifold

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Linear Manifold Embeddings of Pattern Clusters Robert Haralick Rave Harpaz Pattern Recognition

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

CSCE 478/878 Lecture 8: Stephen Scott Clustering Introduction Outline Clustering Stephen

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

Clustering kMeans, Expectation Maximization, Self-Organizing Maps Outline K-means

2013 MCGT Presentation Proposals 2E: a Universal Designation

Presentation Secrets of Steve Jobs: How to Be Insanely Presentation Secrets of Steve Jobs: How to

Welcome What is T-STEM? The T-STEM Initiative provides a foundational approach to empower

HI, WERE THE SALES & MARKETING TEAM Cody Griffith, Sales & Marketing Director Rachel

INVESTING IN LIFES ENDURING EXPERIENCES INVESTOR PRESENTATION THIRD QUARTER 2017 AMC

Helping Your Student Succeed in College A Message to Parents, Family Members and Friends When

Predicting Treatment Attitudes and Behaviors Longitudinally among College Students with

3rd DECISION DECK Workshop Content Coimbra June 16-17, 2008 U niversal M utiple C riteria D

Linear Manifold Clustering Robert Haralick and Rave Harpaz Outline - PowerPoint PPT Presentation

Linear Manifold Clustering Robert Haralick and Rave Harpaz Outline Background The linear manifold cluster model The Linear manifold clustering algorithm Linear manifold modeling Linear manifold subspace correlation clustering Conclusion

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Manifold Learning: Applications in Neuroimaging Robin Wolz 23/09/2011 Overview Manifold

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Linear Manifold Embeddings of Pattern Clusters Robert Haralick Rave Harpaz Pattern Recognition

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

CSCE 478/878 Lecture 8: Stephen Scott Clustering Introduction Outline Clustering Stephen

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

Clustering kMeans, Expectation Maximization, Self-Organizing Maps Outline K-means

2013 MCGT Presentation Proposals 2E: a Universal Designation

Presentation Secrets of Steve Jobs: How to Be Insanely Presentation Secrets of Steve Jobs: How to

Welcome What is T-STEM? The T-STEM Initiative provides a foundational approach to empower

HI, WERE THE SALES &amp; MARKETING TEAM Cody Griffith, Sales &amp; Marketing Director Rachel

INVESTING IN LIFES ENDURING EXPERIENCES INVESTOR PRESENTATION THIRD QUARTER 2017 AMC

Helping Your Student Succeed in College A Message to Parents, Family Members and Friends When

Predicting Treatment Attitudes and Behaviors Longitudinally among College Students with

3rd DECISION DECK Workshop Content Coimbra June 16-17, 2008 U niversal M utiple C riteria D

HI, WERE THE SALES & MARKETING TEAM Cody Griffith, Sales & Marketing Director Rachel