Machine Learning 2 DS 4420 - Spring 2018 From clustering to EM - PowerPoint PPT Presentation

Machine Learning 2 DS 4420 - Spring 2018 From clustering to EM Byron C. Wallace

Clustering

Four Types of Clustering 1. Centroid-based (K-means, K-medoids) Notion of Clusters: Voronoi tesselation

Four Types of Clustering 2. Density-based (DBSCAN, OPTICS) Notion of Clusters: Connected regions of high density

Four Types of Clustering 3. Connectivity-based (Hierarchical) Notion of Clusters: Cut off dendrogram at some depth

Four Types of Clustering 4. Distribution-based (Mixture Models) Notion of Clusters: Distributions on features

Hierarchical Clustering

Dendrogram ( a.k.a. a similarity tree ) Terminal Branch Terminal Branch Root Root Similarity of A and B is Internal Branch Internal Branch represented as height   Internal Node Internal Node of lowest shared   Leaf Leaf internal node

Dendrogram ( a.k.a. a similarity tree ) Similarity of A and B is D(A,B) represented as height   of lowest shared   internal node (Bovine: 0.69395, (Spider Monkey: 0.390, (Gibbon:0.36079,(Orang: 0.33636, (Gorilla: 0.17147,   (Chimp: 0.19268, Human: 0.11927): 0.08386): 0.06124): 0.15057): 0.54939);

Dendrogram ( a.k.a. a similarity tree ) D(A,B) Natural when measuring   genetic similarity, distance   to common ancestor (Bovine: 0.69395, (Spider Monkey: 0.390, (Gibbon:0.36079,(Orang: 0.33636, (Gorilla: 0.17147,   (Chimp: 0.19268, Human: 0.11927): 0.08386): 0.06124): 0.15057): 0.54939);

Example: Iris data Iris Setosa Iris versicolor Iris virginica https://en.wikipedia.org/wiki/Iris_flower_data_set

Hierarchical Clustering ( Euclidian Distance ) https://en.wikipedia.org/wiki/Iris_flower_data_set

Edit Distance Distance Patty and Selma Change dress color, 1 point Change earring shape, 1 point Change hair part, 1 point D(Patty, Selma) = 3 Distance Marge and Selma Change dress color, 1 point Add earrings, 1 point Decrease height, 1 point Take up smoking, 1 point Lose weight, 1 point D(Marge,Selma) = 5 Can be defined for any set of discrete features

Edit Distance for Strings • Transform string Q into string C , using only Similarity “Peter” and “Piotr”? Substitution , Insertion and Deletion . Substitution 1 Unit • Assume that each of these operators has a Insertion 1 Unit cost associated with it. Deletion 1 Unit • The similarity between two strings can be D ( Peter , Piotr ) is 3 defined as the cost of the cheapest transformation from Q to C. Peter Substitution (i for e) Piter Insertion (o) Pioter Deletion (e) Pedro Piotr Peter Piotr Piero Pyotr Petros Pietro Pierre

Hierarchical Clustering ( Edit Distance ) Pedro (Portuguese) Petros (Greek), Peter (English), Piotr (Polish), Peadar (Irish), Pierre (French), Peder (Danish), Peka (Hawaiian), Pietro (Italian), Piero (Italian Alternative), Petr (Czech), Pyotr (Russian) Cristovao (Portuguese) Christoph (German), Christophe (French), Cristobal (Spanish), Cristoforo (Italian), Kristoffer (Scandinavian), Krystof (Czech), Christopher (English) Miguel (Portuguese) Michalis (Greek), Michael (English), Mick (Irish) Cristovao Pedro Miguel Christoph n Piotr r Petros o Pierre o Peter Peka r Michalis Michael Mick Christopher e Cristobal Cristoforo Kristoffer f r o t a e r r h a o t t e d d p e s y e i a e y P P o d i P e r P t s K s P i r i r C h C

Meaningful Patterns Edit distance yields clustering according to geography Slide from Eamonn Keogh Pedro ( Portuguese/Spanish ) Petros ( Greek ), Peter ( English ), Piotr ( Polish ), Peadar (Irish), Pierre ( French ), Peder ( Danish ), Peka (Hawaiian), Pietro ( Italian ), Piero ( Italian Alternative ), Petr (Czech), Pyotr ( Russian )

Spurious Patterns In general clusterings will only be as meaningful as your distance metric spurious; there is no connection between the two South Georgia & Serbia & St. Helena & U.K. AUSTRALIA ANGUILLA FRANCE NIGER INDIA IRELAND BRAZIL South Sandwich Montenegro Dependencies Islands (Yugoslavia)

Spurious Patterns In general clusterings will only be as meaningful as your distance metric spurious; there is no connection between the two South Georgia & Serbia & St. Helena & U.K. AUSTRALIA ANGUILLA FRANCE NIGER INDIA IRELAND BRAZIL South Sandwich Montenegro Dependencies Islands (Yugoslavia) Former UK colonies No relation

“Correct” Number of Clusters to determine the “correct”

“Correct” Number of Clusters to determine the “correct” Determine number of clusters by looking at distance

Detecting Outliers The single isolated branch is suggestive of a data point that is very different to all others Outlier

Bottom up vs. Top down Bottom-up ( agglomerative): Each item starts as its own cluster; greedily merge

Bottom up vs. Top down Bottom-up ( agglomerative): Each item starts as its own cluster; greedily merge Top-down ( divisive ): Start with one big cluster (all data); recursively split

Distance Matrix We begin with a distance matrix which contains the distances between every pair of objects in our 0 8 8 7 7 database. 0 2 4 4 0 3 3 D( , ) = 8 0 1 D( , ) = 1 0

Bottom-up (Agglomerative Clustering) … merges… … merges… Consider Choose … all possible the best merges… 25

Bottom-up (Agglomerative Clustering) … merges… Consider Choose all possible … the best merges… Consider Choose … all possible the best merges… 25

Bottom-up (Agglomerative Clustering) Consider Choose all possible … the best merges… Consider Choose all possible … the best merges… Consider Choose … all possible the best merges… 25

Bottom-up (Agglomerative Clustering) Consider Choose all possible … … the best merges… merges… Consider Choose all possible … the best … merges… merges… Consider Choose … all possible the best … merges… merges… 25

Bottom-up (Agglomerative Clustering) Can you now implement this? Consider Choose all possible … … the best merges… merges… Consider Choose all possible … the best … merges… merges… Consider Choose … all possible the best … merges… merges… 25

Bottom-up (Agglomerative Clustering) Distances between examples (can calculate using metric) Consider Choose all possible … … the best merges… merges… Consider Choose all possible … the best … merges… merges… Consider Choose … all possible the best … merges… merges… 25

Bottom-up (Agglomerative Clustering) How do we calculate the   distance to a cluster? Consider Choose all possible … … the best merges… merges… Consider Choose all possible … the best … merges… merges… Consider Choose … all possible the best … merges… merges… 25

Clustering Criteria Single link:   d ( A , B ) = min a ∈ A , b ∈ B d ( a , b ) (Closest point) Complete link:   d ( A , B ) = max a ∈ A , b ∈ B d ( a , b ) (Furthest point) 1 X Group average:   d ( A , B ) = d ( a , b ) | A || B | (Average distance) a ∈ A , b ∈ B µ X = 1 X Centroid:   d ( A , B ) = d ( µ A , µ B ) x | X | (Distance of average) x ∈ X

Hierarchical Clustering Summary + No need to specify number of clusters + Hierarchical structure maps nicely onto   human intuition in some domains - Scaling: Time complexity at least O ( n 2 )   in number of examples - Heuristic search method :   Local optima are a problem - Interpretation of results is (very) subjective

� � � � Evaluation? 1 1 0.9 0.9 0.8 0.8 0.7 0.7 Random DBSCAN 0.6 0.6 Points 0.5 0.5 y y 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x x 1 1 0.9 0.9 K-means Complete 0.8 0.8 Link 0.7 0.7 0.6 0.6 0.5 0.5 y y 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x x

Machine Learning 2 DS 4420 - Spring 2018 From clustering to EM - PowerPoint PPT Presentation

Machine Learning 2 DS 4420 - Spring 2018 From clustering to EM Byron C. Wallace Clustering Four Types of Clustering 1. Centroid-based (K-means, K-medoids) Notion of Clusters: Voronoi tesselation Four Types of Clustering 2. Density-based

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

1 7 8 Types of simulation models Dynamic simulation models Stochastic vs. deterministic.

Udder Dimensions Dairy produc.on traits Genome Wide Associa.on

CFAP Assistance CFAP provides direct assistance to agricultural producers impacted by the

Making a successful RfPB application Tony Akobeng Royal Manchester Childrens Hospital The

Can beef cattle intensification reduce environmental pressure on Brazilian areas? A case

Negative incidents are very rare Usually only occurs if: East Bay Regional Parks District 18

Semantics for Natural Languages Compositionality Desiderata for Meaning Representation

11: Catchup II Machine Learning and Real-world Data (MLRD) Ann Copestake Lent 2019 Last