Clustering Expression Data www.cs.washington.edu/527 Why cluster - PowerPoint PPT Presentation

Clustering Expression Data www.cs.washington.edu/527 • Why cluster gene expression data? – Tissue classification – Find biologically related genes – First step in inferring regulatory networks – Look for common promoter elements – Hypothesis generation – One of the tools of choice for expression analysis Subscribe, if you Didn’t get msg last night Clustering Expression Data Clustering Algorithms • Partitional • What has been done? – CAST (Ben-Dor et al. 1999) – Hierarchical average-link [Eisen et al. 98] – Self Organizing Maps (SOM) [Tamayo et al. 99] – k-means , variously initialized (Hartigan 1975) – CAST [Ben-Dor et al. 99] • Hierarchical – Support Vector Machines (SVM) [Grundy et al. 00] – etc., etc., etc. – single-link • Why so many methods? – average-link – Clustering is NP-hard, even with simple objectives, data – complete-link – Hard problem: high dimensionality, noise, … – ∴ many heuristic, local search, & approximation algorithms • Random (as a control) – No clear winner – Randomly assign genes to clusters • Others 1

The following slides largely from Overview http://staff.washington.edu/kayee/research.html Errors are mine. • What is clustering? • Similarity/distance metrics • Hierarchical clustering algorithms Clustering 101 – Made popular by Stanford, ie. [Eisen et al . 1998] • K-means – Made popular by many groups, eg. [Tavazoie et al. 1999] • Self-organizing map (SOM) Ka Yee Yeung – Made popular by Whitehead, ie. [Tamayo et al . 1999] Center for Expression Arrays University of Washington How to define similarity? What is clustering? Experiment X genes n 1 p 1 s X • Group similar objects together genes genes • Objects in the same cluster (group) are more Y similar to each other than objects in different Y n n clusters Raw matrix Similarity matrix • Data exploratory tool • Similarity metric: – A measure of pairwise similarity or dissimilarity – Examples: • Correlation coefficient • Euclidean distance 2

Example Similarity metrics 4 X 1 0 -1 0 • Euclidean distance 3 Y 3 2 1 2 2 Z -1 0 1 0 X 1 p Y W 2 0 -2 0 2 � ( X [ j ] Y [ j ] ) Z � 0 W 1 2 3 4 j 1 -1 = -2 • Correlation coefficient -3 Correlation (X,Y) = 1 Distance (X,Y) = 4 p p ( X [ j ] X )( Y [ j ] Y ) X [ j ] � � � � j 1 j 1 = , where X = = Correlation (X,Z) = -1 Distance (X,Z) = 2.83 p p p ( X [ j ] X ) 2 ( Y [ j ] Y ) 2 � � � � Correlation (X,W) = 1 Distance (X,W) = 1.41 j = 1 j = 1 Clustering algorithms Lessons from the example • Inputs: • Correlation – direction only – Raw data matrix or similarity matrix • Euclidean distance – magnitude & direction – Number of clusters or some other parameters • Min # attributes (experiments) to compute pairwise • Many different classifications of clustering similarity algorithms: – >= 2 attributes for Euclidean distance – Hierarchical vs partitional – >= 3 attributes for correlation • Array data is noisy  need many experiments to robustly – Heuristic-based vs model-based estimate pairwise similarity – Soft vs hard 3

Hierarchical Clustering [Hartigan Hierarchical: Single Link 1975] • Agglomerative ( bottom-up) • cluster similarity = similarity of two most similar • Algorithm: members – Initialize: each item a cluster - Potentially – Iterate: long and skinny • select two most similar clusters clusters dendrogram • merge them + Fast – Halt: when required number of clusters is reached Example: single link Example: single link 1 2 3 4 5 1 2 3 4 5 ( 1 , 2 ) 3 4 5 ( 1 , 2 ) 3 4 5 ( 1 , 2 , 3 ) 4 5 1 0 1 0 � � � � ( 1 , 2 ) 0 ( 1 , 2 ) 0 � � � � ( 1 , 2 , 3 ) 0 � � � � � � 2 2 0 2 2 0 � � � � 3 3 0 3 3 0 � � � � � � 4 7 0 � � � � 3 � 6 3 0 � 3 � 6 3 0 � � � 4 9 7 0 4 9 7 0 � � � � � � � � 5 5 4 0 � � 4 10 9 7 0 4 10 9 7 0 � � � � � � � � 5 8 5 4 0 � � 5 8 5 4 0 � � � � 5 � 9 8 5 4 0 � 5 � 9 8 5 4 0 � � � � � 5 5 d min{ d , d } min{ 6 , 3 } 3 = = = d min{ d , d } min{ 9 , 7 } 7 = = = ( 1 , 2 ), 3 1 , 3 2 , 3 ( 1 , 2 , 3 ), 4 ( 1 , 2 ), 4 3 , 4 4 4 d min{ d , d } min{ 10 , 9 } 9 d min{ d , d } min{ 8 , 5 } 5 = = = = = = ( 1 , 2 , 3 ), 5 ( 1 , 2 ), 5 3 , 5 ( 1 , 2 ), 4 1 , 4 2 , 4 3 3 d min{ d , d } min{ 9 , 8 } 8 = = = ( 1 , 2 ), 5 1 , 5 2 , 5 2 2 1 1 4

Hierarchical: Complete Link Example: single link • cluster similarity = similarity of two least similar 1 2 3 4 5 members ( 1 , 2 ) 3 4 5 ( 1 , 2 , 3 ) 4 5 1 0 � � ( 1 , 2 ) 0 � � ( 1 , 2 , 3 ) 0 � � � � 2 2 0 � � � � 3 3 0 � � 4 7 0 � � 3 6 3 0 � � � � 4 9 7 0 � � � � 5 5 4 0 � � 4 10 9 7 0 � � � � � � + tight clusters 5 8 5 4 0 � � 5 � 9 8 5 4 0 � � � 5 - slow d min{ d , d } 5 = = 4 ( 1 , 2 , 3 ), ( 4 , 5 ) ( 1 , 2 , 3 ), 4 ( 1 , 2 , 3 ), 5 3 2 1 Sometimes drawn to a scale Example: complete link Example: complete link 1 2 3 4 5 1 2 3 4 5 ( 1 , 2 ) 3 4 5 ( 1 , 2 ) 3 4 5 ( 1 , 2 ) 3 ( 4 , 5 ) 1 0 1 0 � � � � ( 1 , 2 ) 0 ( 1 , 2 ) 0 � � � � ( 1 , 2 ) 0 � � � � � � 2 2 0 2 2 0 � � � � 3 6 0 3 6 0 � � � � � � 3 6 0 � � � � 3 � 6 3 0 � 3 � 6 3 0 � � � 4 10 7 0 4 10 7 0 � � � � � � � � ( 4 , 5 ) 10 7 0 � � 4 10 9 7 0 4 10 9 7 0 � � � � � � � � 5 9 5 4 0 � � 5 9 5 4 0 � � � � 5 � 9 8 5 4 0 � 5 � 9 8 5 4 0 � � � � � 5 5 d max{ d , d } max{ 6 , 3 } 6 = = = d max{ d , d } max{ 10 , 9 } 10 = = = ( 1 , 2 ), 3 1 , 3 2 , 3 ( 1 , 2 ), ( 4 , 5 ) ( 1 , 2 ), 4 ( 1 , 2 ), 5 4 4 d max{ d , d } max{ 10 , 9 } 10 = = = d max{ d , d } max{ 7 , 5 } 7 = = = ( 1 , 2 ), 4 1 , 4 2 , 4 3 , ( 4 , 5 ) 3 , 4 3 , 5 3 3 d max{ d , d } max{ 9 , 8 } 9 = = = ( 1 , 2 ), 5 1 , 5 2 , 5 2 2 1 1 5

Hierarchical: Average Link Example: complete link • cluster similarity = average similarity of all pairs 1 2 3 4 5 ( 1 , 2 ) 3 4 5 ( 1 , 2 ) 3 ( 4 , 5 ) 1 0 � � ( 1 , 2 ) 0 � � ( 1 , 2 ) 0 � � � � 2 2 0 � � � � 3 6 0 � � 3 6 0 � � 3 6 3 0 � � � � 4 10 7 0 � � � � ( 4 , 5 ) 10 7 0 � � 4 10 9 7 0 � � � � � � + tight clusters 5 9 5 4 0 � � 5 � 9 8 5 4 0 � � � 5 - slow d max{ d , d } 10 = = 4 ( 1 , 2 , 3 ), ( 4 , 5 ) ( 1 , 2 ), ( 4 , 5 ) 3 , ( 4 , 5 ) 3 2 1 Example: average link Example: average link 1 2 3 4 5 1 2 3 4 5 ( 1 , 2 ) 3 4 5 ( 1 , 2 ) 3 4 5 ( 1 , 2 ) 3 ( 4 , 5 ) 1 0 1 0 � � � � ( 1 , 2 ) 0 ( 1 , 2 ) 0 � � � � ( 1 , 2 ) 0 � � � � � � 2 2 0 2 2 0 � � � � 3 4 . 5 0 3 4 . 5 0 � � � � � � 3 4 . 5 0 � � � � 3 � 6 3 0 � 3 � 6 3 0 � � � 4 9 . 5 7 0 4 9 . 5 7 0 � � � � � � � � ( 4 , 5 ) 9 6 0 � � 4 10 9 7 0 4 10 9 7 0 � � � � � � � � 5 8 . 5 5 4 0 � � 5 8 . 5 5 4 0 � � � � 5 � 9 8 5 4 0 � 5 � 9 8 5 4 0 � � � � � 5 1 6 3 + 5 d ( d d ) 4 . 5 = + = = ( 1 , 2 ), 3 1 , 3 2 , 3 1 2 2 d ( d d d d ) 9 4 = + + + = ( 1 , 2 ), ( 4 , 5 ) 1 , 4 1 , 5 2 , 4 2 , 5 4 4 1 10 9 + d ( d d ) 9 . 5 = + = = 1 ( 1 , 2 ), 4 1 , 4 2 , 4 3 2 2 3 d ( d d ) 6 = + = 3 , ( 4 , 5 ) 3 , 4 3 , 5 2 1 9 8 2 + 2 d ( d d ) 8 . 5 = + = = ( 1 , 2 ), 5 1 , 5 2 , 5 2 2 1 1 6

Clustering Expression Data www.cs.washington.edu/527 Why cluster - PowerPoint PPT Presentation

Clustering Expression Data www.cs.washington.edu/527 Why cluster gene expression data? Tissue classification Find biologically related genes First step in inferring regulatory networks Look for common promoter elements

Gene Expression Data Introduction to gene expression data Expression data storage concept An

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

The Expression Problem and Lenses Lambdajam 2016 Tony Morris The Expression Problem A new name

CHAPTER VIII VIII CHAPTER Data Clustering and Data Clustering and Self- -Organizing Feature

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

CSCE 478/878 Lecture 8: Stephen Scott Clustering Introduction Outline Clustering Stephen

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

Clustering kMeans, Expectation Maximization, Self-Organizing Maps Outline K-means

Lecture 23: Spectral clustering Hierarchical clustering What is a good clustering?

Some Notes on the Psychoacoustics and Signal Processing of RASTA-PLP Analysis of Speech Jrg

2016 Gail Williams Research Integrity at UQ Research Integrity at UQ Australian Code for the

CNV Overview In this lecture we review the topics we have covered this CNV Semester Review

Topographic Organization of Receptive Fields in RecSOM or RecSOM as nonlinear IFS Peter Ti

Categorical Data Clustering Using Statistical Methods and Neural Networks . Kudov 1 , H.

For Monday Read chapter 12 Program 4 Any questions? Visualizing Weight Vectors 2d

Ramsey spaces and the Katetov order Sonia Navarro Flores National University of Mexico BLAST

Some possible exponentiations over the enveloping algebras universal enveloping algebra of sl 2 (