Clustering: Hierarchical Clustering and K- Means Clustering - PowerPoint PPT Presentation

Clustering: ¡Hierarchical ¡Clustering ¡and ¡K-‑ Means ¡Clustering ¡ Machine ¡Learning ¡10-‑601B ¡ Seyoung ¡Kim ¡ Many ¡of ¡these ¡slides ¡are ¡derived ¡from ¡William ¡ Cohen, ¡Ziv ¡Bar-‑Joseph, ¡Eric ¡Xing. ¡Thanks! ¡

Two ¡Classes ¡of ¡Learning ¡Problems ¡ • Supervised ¡learning ¡= ¡learning ¡from ¡labeled ¡data, ¡where ¡class ¡ labels ¡(in ¡classificaNon) ¡or ¡output ¡values ¡(regression) ¡are ¡given ¡ ¡ ¡ ¡ – Train ¡data: ¡(X, ¡Y) ¡for ¡inputs ¡X ¡and ¡labels ¡Y ¡ • Unsupervised ¡learning ¡= ¡learning ¡from ¡unlabeled, ¡ unannotated ¡data ¡ – Train ¡data: ¡X ¡for ¡unlabeled ¡data ¡ – we ¡do ¡not ¡have ¡a ¡teacher ¡that ¡provides ¡examples ¡with ¡their ¡labels ¡ ¡

• Organizing ¡data ¡into ¡ clusters ¡such ¡that ¡there ¡is ¡ • ¡high ¡intra-‑cluster ¡similarity ¡ • ¡low ¡inter-‑cluster ¡similarity ¡ ¡ • Informally, ¡finding ¡natural ¡ groupings ¡among ¡objects. ¡ • Why ¡do ¡we ¡want ¡to ¡do ¡that? ¡ • Any ¡REAL ¡applicaNon? ¡

Examples ¡ • People ¡ • Images ¡ • Language ¡ • species ¡

Unsupervised ¡learning ¡ • ¡Clustering ¡methods ¡ – Non-‑probabilisNc ¡method ¡ • Hierarchical ¡clustering ¡ • K ¡means ¡algorithm ¡ – ProbabilisNc ¡method ¡ • Mixture ¡model ¡ • ¡We ¡will ¡also ¡discuss ¡dimensionality ¡reducNon, ¡another ¡ unsupervised ¡learning ¡method ¡later ¡in ¡the ¡course ¡

The ¡quality ¡or ¡state ¡of ¡being ¡similar; ¡likeness; ¡resemblance; ¡as, ¡a ¡similarity ¡of ¡features. ¡ Webster's ¡DicEonary ¡ Similarity ¡is ¡hard ¡ to ¡define, ¡but… ¡ ¡ “ We ¡know ¡it ¡ when ¡we ¡see ¡it ” ¡ The ¡real ¡meaning ¡ of ¡similarity ¡is ¡a ¡ philosophical ¡ quesNon. ¡We ¡will ¡ take ¡a ¡more ¡ pragmaNc ¡ approach. ¡ ¡ ¡

DefiniEon : ¡Let ¡ O 1 ¡and ¡ O 2 ¡be ¡two ¡objects ¡from ¡the ¡universe ¡ of ¡possible ¡objects. ¡The ¡distance ¡(dissimilarity) ¡between ¡ O 1 ¡and ¡ O 2 ¡is ¡a ¡real ¡number ¡denoted ¡by ¡ D ( O 1 , O 2 ) ¡ gene1 gene2 0.23 ¡ 3 ¡ 342.7 ¡

What ¡properEes ¡should ¡a ¡distance ¡measure ¡ have? ¡ • D (A,B) ¡= ¡ D (B,A) ¡ ¡ ¡ ¡ Symmetry ¡ • D (A,A) ¡= ¡0 ¡ ¡ ¡ ¡ ¡ Constancy ¡of ¡Self-‑Similarity ¡ • D (A,B) ¡= ¡0 ¡IIf ¡A= ¡B ¡ ¡ ¡ ¡ Posi:vity ¡Separa:on ¡ • D (A,B) ¡ ≤ ¡ D (A,C) ¡+ ¡ D (B,C) ¡ ¡ ¡ Triangular ¡Inequality ¡

IntuiEons ¡behind ¡desirable ¡distance ¡measure ¡ properEes ¡ • D (A,B) ¡= ¡ D (B,A) ¡ ¡ ¡ ¡ Symmetry ¡ – Otherwise ¡you ¡could ¡claim ¡"Alex ¡looks ¡like ¡Bob, ¡but ¡Bob ¡looks ¡nothing ¡like ¡ Alex" ¡ • D (A,A) ¡= ¡0 ¡ ¡ ¡ ¡ ¡ Constancy ¡of ¡Self-‑Similarity ¡ – Otherwise ¡you ¡could ¡claim ¡"Alex ¡looks ¡more ¡like ¡Bob, ¡than ¡Bob ¡does" ¡ • D (A,B) ¡= ¡0 ¡IIf ¡A= ¡B ¡ ¡ ¡ ¡ Posi:vity ¡Separa:on ¡ – Otherwise ¡there ¡are ¡objects ¡in ¡your ¡world ¡that ¡are ¡different, ¡but ¡you ¡ cannot ¡tell ¡apart. ¡ • D (A,B) ¡ ≤ ¡ D (A,C) ¡+ ¡ D (B,C) ¡ ¡ ¡ Triangular ¡Inequality ¡ – Otherwise ¡you ¡could ¡claim ¡"Alex ¡is ¡very ¡like ¡Bob, ¡and ¡Alex ¡is ¡very ¡like ¡Carl, ¡ but ¡Bob ¡is ¡very ¡unlike ¡Carl" ¡

Distance ¡Measures ¡ • Suppose ¡two ¡object ¡ x ¡and ¡ y ¡both ¡have ¡ p ¡features ¡ • Euclidean ¡distance ¡ p ∑ | 2 d ( x , y ) = | x i − y i 2 i = 1 • CorrelaNon ¡coefficient ¡

Similarity ¡Measures: ¡CorrelaEon ¡Coefficient ¡ Expression Level Expression Level Gene A Gene B Gene B Gene A Time Time Expression Level Gene B Gene A Time

• Hierarchical ¡algorithms: ¡Create ¡a ¡hierarchical ¡decomposiNon ¡ of ¡the ¡set ¡of ¡objects ¡using ¡some ¡criterion ¡ • ParEEonal ¡algorithms: ¡Construct ¡various ¡parNNons ¡and ¡then ¡ evaluate ¡them ¡by ¡some ¡criterion ¡ Bojom ¡up ¡or ¡top ¡down ¡ Top ¡down ¡

(How-‑to) ¡Hierarchical ¡Clustering ¡ BoOom-‑Up ¡(agglomeraEve): ¡StarNng ¡ with ¡each ¡item ¡in ¡its ¡own ¡cluster, ¡find ¡ the ¡best ¡pair ¡to ¡merge ¡into ¡a ¡new ¡ cluster. ¡Repeat ¡unNl ¡all ¡clusters ¡are ¡ fused ¡together. ¡ ¡

We ¡begin ¡with ¡a ¡distance ¡matrix ¡which ¡ contains ¡the ¡distances ¡between ¡every ¡ pair ¡of ¡objects ¡in ¡our ¡database. ¡ 0 ¡ 8 ¡ 8 ¡ 7 ¡ 7 ¡ 0 ¡ 2 ¡ 4 ¡ 4 ¡ 0 ¡ 3 ¡ 3 ¡ D( ¡ ¡, ¡ ¡) ¡= ¡8 ¡ 0 ¡ 1 ¡ D( ¡ ¡, ¡ ¡) ¡= ¡1 ¡ 0 ¡

Consider ¡all ¡ Choose ¡ possible ¡ … ¡ the ¡best ¡ merges… ¡

Consider ¡all ¡ Choose ¡ possible ¡ the ¡best ¡ … ¡ merges… ¡ Consider ¡all ¡ Choose ¡ possible ¡ … ¡ the ¡best ¡ merges… ¡

Consider ¡all ¡ Choose ¡ possible ¡ … ¡ the ¡best ¡ merges… ¡ Consider ¡all ¡ Choose ¡ possible ¡ the ¡best ¡ … ¡ merges… ¡ Consider ¡all ¡ Choose ¡ possible ¡ … ¡ the ¡best ¡ merges… ¡

Consider ¡all ¡ Choose ¡ possible ¡ … ¡ the ¡best ¡ merges… ¡ But ¡how ¡do ¡we ¡compute ¡distances ¡ between ¡clusters ¡rather ¡than ¡ objects? ¡ Consider ¡all ¡ Choose ¡ possible ¡ the ¡best ¡ … ¡ merges… ¡ Consider ¡all ¡ Choose ¡ possible ¡ … ¡ the ¡best ¡ merges… ¡

CompuEng ¡distance ¡between ¡clusters: ¡Single ¡ Link ¡ • cluster ¡distance ¡= ¡distance ¡of ¡two ¡closest ¡members ¡in ¡each ¡ class ¡ - Potentially long and skinny clusters

CompuEng ¡distance ¡between ¡clusters: ¡ ¡ Complete ¡Link ¡ • cluster ¡distance ¡= ¡distance ¡of ¡two ¡farthest ¡members ¡ + tight clusters

CompuEng ¡distance ¡between ¡clusters: ¡Average ¡ Link ¡ • cluster ¡distance ¡= ¡average ¡distance ¡of ¡all ¡pairs ¡ the most widely used measure Robust against noise

Example: ¡single ¡link ¡ 5 ¡ 4 ¡ 3 ¡ 2 ¡ 1 ¡

Single ¡linkage ¡ Height ¡represents ¡distance ¡ between ¡objects ¡/ ¡clusters ¡ Average ¡linkage ¡

Summary ¡of ¡Hierarchal ¡Clustering ¡Methods ¡ • No ¡need ¡to ¡specify ¡the ¡number ¡of ¡clusters ¡in ¡advance. ¡ ¡ • ¡Hierarchical ¡structure ¡maps ¡nicely ¡onto ¡human ¡intuiNon ¡for ¡ some ¡domains ¡ • ¡They ¡do ¡not ¡scale ¡well: ¡Nme ¡complexity ¡of ¡at ¡least ¡O(n 2 ), ¡ where ¡n ¡is ¡the ¡number ¡of ¡total ¡objects. ¡ • ¡Like ¡any ¡heurisNc ¡search ¡algorithms, ¡local ¡opNma ¡are ¡a ¡ problem. ¡ • ¡InterpretaNon ¡of ¡results ¡is ¡(very) ¡subjecNve. ¡ ¡

But ¡what ¡are ¡the ¡clusters? ¡ In ¡some ¡cases ¡we ¡can ¡determine ¡the ¡“correct” ¡number ¡of ¡clusters. ¡However, ¡things ¡are ¡rarely ¡ this ¡clear ¡cut, ¡unfortunately. ¡

The ¡single ¡isolated ¡branch ¡is ¡suggesNve ¡of ¡a ¡data ¡point ¡that ¡is ¡ very ¡different ¡to ¡all ¡others ¡ Outlier ¡

Example: ¡clustering ¡genes ¡ • Microarrays ¡measure ¡the ¡acNviNes ¡of ¡ all ¡genes ¡in ¡different ¡condiNons ¡ • Group ¡genes ¡that ¡perform ¡the ¡same ¡ funcNon ¡ • Clustering ¡genes ¡can ¡help ¡determine ¡ new ¡funcNons ¡for ¡unknown ¡genes ¡ • An ¡early ¡“killer ¡applicaNon” ¡in ¡this ¡area ¡ – The ¡most ¡cited ¡(>12,000) ¡paper ¡in ¡PNAS! ¡

Clustering: Hierarchical Clustering and K- Means Clustering - PowerPoint PPT Presentation

Clustering: Hierarchical Clustering and K- Means Clustering Machine Learning 10-601B Seyoung Kim Many of these slides are derived from William Cohen,

Hierarchical clustering David M. Blei COS424 Princeton University February 28, 2008 D. Blei

Unsupervised Learning and Clustering Owen Roberts, Zach Busser, Ganesh Sugunan Hierarchical

K-MEANS++ OPTIMAL INITIALIZATION ALGORITHM An Improved K-means Clustering Method OVERVIEW

Hierarchical Clustering 4-4-16 Hierarchical clustering: the setting Unsupervised learning

LECTURE 7 Clustering The k-means algorithm Hierarchical Clustering The DBSCAN algorithm

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

Lecture 23: Spectral clustering Hierarchical clustering What is a good clustering?

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

Clustering kMeans, Expectation Maximization, Self-Organizing Maps Outline K-means

Data Clustering: Data Clustering: 50 Years Beyond K means 50 Years Beyond K means 50 Years

Multi-variable Optimization K-means clustering K-means clustering on points is finding K

k -means clustering Method to automatically separate data sets into distinct groups. Clustering

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

Basics of k-means clustering CLUS TERIN G METH ODS W ITH S CIP Y Shaumik Daityari Business

New Applications of Moment-SOS Hierarchies Victor Magron , RA Imperial College 13 November 2014

Equational Logic: Part 2 Roland Backhouse March 6, 2001 2 Outline We continue the

Recent Evidence for Continental Shelf Methane Clathrate Instability and Proposed Emergency Plan

Using neutrino telescopes to learn about particle physics and astrophysics Irina Mocioiu

Topics What is a bond? Introduction to Fixed-Income Time Value of Money and Bond Pricing

A more powerful subvector Anderson and Rubin test in linear instrumental variables regression

Computer-Based Simulation William C. McGaghie, PhD Northwestern University Feinberg School of

On algebraic branching programs of small width Karl Bringmann Christian Ikenmeyer MPII Saarbr