Machine Learning: Algorithms and Applications Floriano Zini Free - PDF document

14/05/12 ¡ Machine Learning: Algorithms and Applications Floriano Zini Free University of Bozen-Bolzano Faculty of Computer Science Academic Year 2011-2012 Lecture 10: 14 May 2012 Unsupervised Learning (cont…) Slides courtesy of Bing Liu: www.cs.uic.edu/~liub/WebMiningBook.html 1 ¡

14/05/12 ¡ Road map n Basic concepts n K-means algorithm n Representation of clusters n Hierarchical clustering n Distance functions n Data standardization n Handling mixed attributes n Which clustering algorithm to use? n Cluster evaluation n Summary Hierarchical Clustering Produce a nested sequence of clusters, a tree, also called dendrogram n Singleton clusters are at the bottom of the three n One root clusters covers all the data points n Siblings clusters partition the data points of the common parent 2 ¡

14/05/12 ¡ Types of hierarchical clustering n Agglomerative (bottom up) clustering: it builds the dendrogram (tree) from the bottom level, and q merges the most similar (or nearest) pair of clusters q stops when all the data points are merged into a single cluster (i.e., the root cluster) n Divisive (top down) clustering: it starts with all data points in one cluster, the root q splits the root into a set of child clusters q each child cluster is recursively divided further q stops when only singleton clusters of individual data points remain Agglomerative clustering It is more popular then divisive methods n At the beginning, each data point forms a cluster (also called a node) n Merge nodes/clusters that have the least distance n Go on merging n Eventually all nodes belong to one cluster 3 ¡

14/05/12 ¡ Agglomerative clustering algorithm An example: working of the algorithm 4 ¡

14/05/12 ¡ Measuring the distance of two clusters n A few ways to measure distances of two clusters q k-means uses only the distances between centroids n Different variations of the algorithm q Single link q Complete link q Average link q Centroids q … Single link method n The distance between two clusters is the distance between two closest data points in the two clusters q one data point from each cluster n It can find arbitrarily The two natural clusters shaped clusters, but (in red) are not found q It may cause the undesirable “chain effect” by noisy points (in black) 5 ¡

14/05/12 ¡ Complete link method n The distance between two clusters is the distance of two furthest data points in the two clusters n It is sensitive to outliers (in black) because they are far away n It usually produces better clusters than the single-link method Average link and centroid methods Average link method n A compromise between q the sensitivity of complete-link clustering to outliers and q the tendency of single-link clustering to form long chains that do not correspond to the intuitive notion of clusters as compact, spherical objects n The distance between two clusters is the average distance of all pair-wise distances between the data points in two clusters Centroid method n the distance between two clusters is the distance between their centroids 6 ¡

14/05/12 ¡ The complexity n All the hierarchical algorithms are at least O(n 2 ) q n is the number of data points n Single link can be done in O(n 2 ) n Complete and average links can be done in O(n 2 log n) n Due the complexity, hierarchical algorithms are hard to use for large data sets q Perform hierarchical clustering on a sample of data points and then assign the others by distance or by supervised learning (see lecture 9) q Use scale-up methods (e.g., BIRCH) that find many small clusters using an efficient algorithm n use these clusters as the starting nodes for the hierarchical clustering n Road map n Basic concepts n K-means algorithm n Representation of clusters n Hierarchical clustering n Distance functions n Data standardization n Handling mixed attributes n Which clustering algorithm to use? n Cluster evaluation n Summary 7 ¡

14/05/12 ¡ Distance functions n Key to clustering q “similarity” and “dissimilarity” are other commonly used terms n There are numerous distance functions for q Different types of data n Numeric data n Nominal data n … q Different specific applications Distance functions for numeric attributes n We denote distance with dist ( x i , x j ), where x i and x j are data points (vectors) n Most commonly used functions are q Euclidean distance and q Manhattan (city block) distance n They are special cases of Minkowski distance 1 h + x i 2 ! x j 2 h + ... + x ir ! x jr ( h ) h dist ( x i , x j ) = x i 1 ! x j 1 h is positive integer, r is the number of attributes 8 ¡

14/05/12 ¡ Euclidean distance and Manhattan distance n If h = 2, it is the Euclidean distance 2 2 2 dist ( x , x ) ( x x ) ( x x ) ... ( x x ) = − + − + + − i j i 1 j 1 i 2 j 2 ir jr n If h = 1, it is the Manhattan distance dist ( x , x ) | x x | | x x | ... | x x | = − + − + + − i j i 1 j 1 i 2 j 2 ir jr n Weighted Euclidean distance 2 2 2 dist ( x , x ) w ( x x ) w ( x x ) ... w ( x x ) = − + − + + − i j 1 i 1 j 1 2 i 2 j 2 r ir jr Squared distance and Chebychev distance n Squared Euclidean distance : to place progressively greater weight on data points that are further apart 2 2 2 dist ( x , x ) ( x x ) ( x x ) ... ( x x ) = − + − + + − i j i 1 j 1 i 2 j 2 ir jr n Chebychev distance : one wants to define two data points as “different” if they are different on any one of the attributes ( ) dist ( x i , x j ) = max x i 1 ! x j 1 , x i 2 ! x j 2 , … , x ir ! x jr 9 ¡

14/05/12 ¡ Distance functions for binary and nominal attributes n Binary attribute: has two values or states but no ordering relationships, q E.g., Gender: female and male q The 2 values are conventionally represented by 1 and 0 n We use a confusion matrix to introduce the distance functions/measures n Let the i th and j th data points be x i and x j (vectors) Confusion matrix 10 ¡

14/05/12 ¡ Symmetric binary attributes n A binary attribute is symmetric if both of its states (0 and 1) have equal importance, e.g., female and male of the attribute Gender n Distance function: Simple Matching Distance, proportion of mismatches of their values b c + (1) dist ( x , x ) = i j a b c d + + + n There are variations, adding weights To mismatches To matches 2( b + c ) b + c dist ( x i , x j ) = dist ( x i , x j ) = a + d + 2( b + c ) 2( a + d ) + b + c Symmetric binary attributes: example n x 1 and x 2 are two data points n Each of the 7 attributes is symmetric binary n The simple matching distance is b + c 2 + 2 + 1 + 2 = 3 2 + 1 dist ( x 1 , x 2 ) = a + b + c + d = 7 = 0.429 n If there is a weight on mismatches 2( b + c ) 2 + 2(2 + 1) + 2 = 6 2(2 + 1) dist ( x 1 , x 2 ) = a + 2( b + c ) + d = 10 = 0.6 11 ¡

14/05/12 ¡ Asymmetric binary attributes n Asymmetric: if one of the states is more important or valuable than the other q By convention, state 1 represents the more important state, which is typically the rare or infrequent state q Jaccard distance is a popular measure b c + (2) dist ( x , x ) = i j a b c + + q There are variations, adding weights To mismatches To matches of the important state 2( b + c ) b + c dist ( x i , x j ) = dist ( x i , x j ) = a + 2( b + c ) 2 a + b + c Asymmetric binary attributes: example n x 1 and x 2 are two data points n Each of the 7 attributes is asymmetric binary n The Jaccard distance is b + c 2 + 2 + 1 = 3 2 + 1 dist ( x 1 , x 2 ) = a + b + c = 5 = 0.6 n If there is a weight on matches of the important state b + c 2*2 + 2 + 1 = 3 2 + 1 dist ( x 1 , x 2 ) = 7 = 0.429 2 a + b + c = 12 ¡

14/05/12 ¡ Nominal attributes n Nominal attributes : with more than two states or values q the commonly used distance measure is also based on the simple matching method q Given two data points x i and x j , let the number of attributes be r , and the number of values that match in x i and x j be q r q − dist ( x , x ) (3) = i j r Road map n Basic concepts n K-means algorithm n Representation of clusters n Hierarchical clustering n Distance functions n Data standardization n Handling mixed attributes n Which clustering algorithm to use? n Cluster evaluation n Summary 13 ¡

14/05/12 ¡ Data standardization n In the Euclidean space, standardization of attributes is recommended so that all attributes can have equal impact on the computation of distances n Consider the following pair of data points q x i : (0.1, 20) and x j : (0.9, 720) (0.9 ! 0.1) 2 + (720 ! 20) 2 = 700.000457 dist ( x i , x j ) = n The distance is almost completely dominated by (720-20) = 700 n Standardize attributes: to force the attributes to have a common value range Interval-scaled attributes n Their values are real numbers following a linear scale q E.g., the difference in Age between 10 and 20 is the same as that between 40 and 50 q The key idea is that intervals keep the same importance through out the scale n Two main approaches to standardize interval scaled attributes, range and z-score 14 ¡

Machine Learning: Algorithms and Applications Floriano Zini Free - PDF document

14/05/12 Machine Learning: Algorithms and Applications Floriano Zini Free University of Bozen-Bolzano Faculty of Computer Science Academic Year 2011-2012 Lecture 10: 14 May 2012 Unsupervised Learning (cont) Slides courtesy of Bing

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

MLCC 2015 machine learning applications Francesca Odone ML applications Machine Learning

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

VLSI Design Part 2.1.1: Combinational circuit Liang Liu liang.liu@eit.lth.se 1 Lund University

Semantic mining: Unsupervised acquisition of multilingual semantic classes from texts Presenter:

N328 Visualizing Information Week 2 | Data Abstractions & Intro to Tableau Khairi Reda |

1 PHP: PHP Hypertext Processor Our first web model

ENERGY STAR Connected Thermostats Stakeholder Working Meeting Field Savings Metric May 26, 2017

USDA Foods 101 Marlon Hopkins Supervisor, Food Distribution Program OSPI Child Nutrition

NE Food Processors Community of Practice VT Food Venture Center Coastal Farms Food Processing

I LOVE MAKING SENSE OF MESSES. Thinking about INFORMATION as a material is hard Information can

Machine Learning: Algorithms and Applications Floriano Zini Free - PDF document

14/05/12 Machine Learning: Algorithms and Applications Floriano Zini Free University of Bozen-Bolzano Faculty of Computer Science Academic Year 2011-2012 Lecture 10: 14 May 2012 Unsupervised Learning (cont) Slides courtesy of Bing

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

MLCC 2015 machine learning applications Francesca Odone ML applications Machine Learning

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

VLSI Design Part 2.1.1: Combinational circuit Liang Liu liang.liu@eit.lth.se 1 Lund University

Semantic mining: Unsupervised acquisition of multilingual semantic classes from texts Presenter:

N328 Visualizing Information Week 2 | Data Abstractions &amp; Intro to Tableau Khairi Reda |

1 PHP: PHP Hypertext Processor Our first web model

ENERGY STAR Connected Thermostats Stakeholder Working Meeting Field Savings Metric May 26, 2017

USDA Foods 101 Marlon Hopkins Supervisor, Food Distribution Program OSPI Child Nutrition

NE Food Processors Community of Practice VT Food Venture Center Coastal Farms Food Processing

I LOVE MAKING SENSE OF MESSES. Thinking about INFORMATION as a material is hard Information can

N328 Visualizing Information Week 2 | Data Abstractions & Intro to Tableau Khairi Reda |