Machine Learning: Joy of Data
Sarath Chandar University of Montreal
Machine Learning: Joy of Data Sarath Chandar University of - - PowerPoint PPT Presentation
Machine Learning: Joy of Data Sarath Chandar University of Montreal Regression Predict my houses price ! Price Prediction Size in feet 2 ( x ) Training set of Price ($) in 1000's ( y ) 2104 460 housing prices 1416 232 (Portland, OR)
Sarath Chandar University of Montreal
Notation: m = Number of training examples x’s = “input” variable / features y’s = “output” variable / “target” variable Size in feet2 (x) Price ($) in 1000's (y) 2104 460 1416 232 1534 315 852 178 … …
Training Set Learning Algorithm h Size of house Estimated price
Size in feet2 (x) Price ($) in 1000's (y) 2104 460 1416 232 1534 315 852 178 … …
1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
y x
Hypothesis: Parameters: Cost Function: Goal:
1 0 J(0,1)
0 1 J(0,1)
If α is too small, gradient descent can be slow. If α is too large, gradient descent can overshoot the minimum. It may fail to converge, or even diverge.
update and simultaneously
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
– Each record contains a set of attributes, one of the attributes is the class.
– A test set is used to determine the accuracy of the
and test sets, with training set used to build the model and test set used to validate it.
Apply Model
Induction Deduction
Learn Model
Model
Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes
10Tid Attrib1 Attrib2 Attrib3 Class 11 No Small 55K ? 12 Yes Medium 80K ? 13 Yes Large 110K ? 14 No Small 95K ? 15 No Large 67K ?
10Test Set Learning algorithm Training Set
be similar (or related) to one another and different from (or unrelated to) the objects in other groups
Inter-cluster distances are maximized Intra-cluster distances are minimized
Understanding
– Group related documents for browsing, group genes and proteins that have similar functionality, or group stocks with similar price fluctuations
Summarization
– Reduce the size of large data sets
Discovered Clusters Industry Group
1
Applied-Matl-DOWN,Bay-Network-Down,3-COM-DOWN, Cabletron-Sys-DOWN,CISCO-DOWN,HP-DOWN, DSC-Comm-DOWN,INTEL-DOWN,LSI-Logic-DOWN, Micron-Tech-DOWN,Texas-Inst-Down,Tellabs-Inc-Down, Natl-Semiconduct-DOWN,Oracl-DOWN,SGI-DOWN, Sun-DOWN
Technology1-DOWN
2
Apple-Comp-DOWN,Autodesk-DOWN,DEC-DOWN, ADV-Micro-Device-DOWN,Andrew-Corp-DOWN, Computer-Assoc-DOWN,Circuit-City-DOWN, Compaq-DOWN, EMC-Corp-DOWN, Gen-Inst-DOWN, Motorola-DOWN,Microsoft-DOWN,Scientific-Atl-DOWN
Technology2-DOWN
3
Fannie-Mae-DOWN,Fed-Home-Loan-DOWN, MBNA-Corp-DOWN,Morgan-Stanley-DOWN
Financial-DOWN
4
Baker-Hughes-UP,Dresser-Inds-UP,Halliburton-HLD-UP, Louisiana-Land-UP,Phillips-Petro-UP,Unocal-UP, Schlumberger-UP
Oil-UP
How many clusters? Four Clusters Two Clusters Six Clusters
– A division data objects into non-overlapping subsets (clusters) such that each data object is in exactly one subset
– A set of nested clusters organized as a hierarchical tree
Original Points A Partitional Clustering
p4 p1 p3 p2 p4 p1 p3 p2
p4 p1 p2 p3 p4 p1 p2 p3 Traditional Hierarchical Clustering Non-traditional Hierarchical Clustering Non-traditional Dendrogram Traditional Dendrogram
– A cluster is a set of points such that any point in a cluster is closer (or more similar) to every other point in the cluster than to any point not in the cluster.
3 well-separated clusters
– A cluster is a set of objects such that an object in a cluster is closer (more similar) to the “center” of a cluster, than to the center of any
– The center of a cluster is often a centroid, the average of all the points in the cluster, or a medoid, the most “representative” point
4 center-based clusters
– A cluster is a set of points such that a point in a cluster is closer (or more similar) to one or more other points in the cluster than to any point not in the cluster.
8 contiguous clusters
– A cluster is a dense region of points, which is separated by low- density regions, from other regions of high density. – Used when the clusters are irregular or intertwined, and when noise and outliers are present.
6 density-based clusters
– Finds clusters that share some common property or represent a particular concept. .
2 Overlapping Circles
– Clusters produced vary from one run to another.
correlation, etc.
mentioned above.
– Often the stopping condition is changed to ‘Until relatively few points change clusters’
0.5 1 1.5 2 0.5 1 1.5 2 2.5 3
x y
0.5 1 1.5 2 0.5 1 1.5 2 2.5 3
x y
Sub-optimal Clustering
0.5 1 1.5 2 0.5 1 1.5 2 2.5 3
x y
Optimal Clustering Original Points
0.5 1 1.5 2 0.5 1 1.5 2 2.5 3
x y
0.5 1 1.5 2 0.5 1 1.5 2 2.5 3
x y
0.5 1 1.5 2 0.5 1 1.5 2 2.5 3
x y
0.5 1 1.5 2 0.5 1 1.5 2 2.5 3
x y
0.5 1 1.5 2 0.5 1 1.5 2 2.5 3
x y
0.5 1 1.5 2 0.5 1 1.5 2 2.5 3
x y
K i C x i
i
x m dist SSE
1 2
) , (
– For each point, the error is the distance to the nearest cluster – To get SSE, we square these errors and sum them. – x is a data point in cluster Ci and mi is the representative point for cluster Ci
– Given two clusters, we can choose the one with the smallest error – One easy way to reduce SSE is to increase K, the number of clusters
clustering with higher K
0.5 1 1.5 2 0.5 1 1.5 2 2.5 3
x y
Iteration 1
0.5 1 1.5 2 0.5 1 1.5 2 2.5 3
x y
Iteration 2
0.5 1 1.5 2 0.5 1 1.5 2 2.5 3
x y
Iteration 3
0.5 1 1.5 2 0.5 1 1.5 2 2.5 3
x y
Iteration 4
0.5 1 1.5 2 0.5 1 1.5 2 2.5 3
x y
Iteration 5
Original Points K-means (3 Clusters)
Original Points K-means (3 Clusters)
Original Points K-means (2 Clusters)
Original Points K-means Clusters
One solution is to use many clusters. Find parts of clusters, that need to put together.
Original Points K-means Clusters
Original Points K-means Clusters