Unsupervised learning Clustering and Dimensionality Reduction Marta - - PowerPoint PPT Presentation

unsupervised learning
SMART_READER_LITE
LIVE PREVIEW

Unsupervised learning Clustering and Dimensionality Reduction Marta - - PowerPoint PPT Presentation

Unsupervised learning Clustering and Dimensionality Reduction Marta Arias marias@cs.upc.edu Dept. CS, UPC Fall 2018 Clustering Partition input examples into similar subsets Clustering Partition input examples into similar subsets Clustering


slide-1
SLIDE 1

Unsupervised learning

Clustering and Dimensionality Reduction

Marta Arias marias@cs.upc.edu

  • Dept. CS, UPC

Fall 2018

slide-2
SLIDE 2

Clustering

Partition input examples into similar subsets

slide-3
SLIDE 3

Clustering

Partition input examples into similar subsets

slide-4
SLIDE 4

Clustering

Main challenges

◮ How to measure similarity? ◮ How many clusters? ◮ How do we evaluate the clusters?

Algorithms we will cover

◮ K-means ◮ Hierarchical clustering

slide-5
SLIDE 5

K-means clustering

Intuition

◮ Input data are:

◮ m examples x1, .., xm, and ◮ K, the number of desired clusters

◮ Clusters represented by cluster centers µ1, .., µK ◮ Given centers µ1, .., µK, each center defines a cluster: the

subset of inputs xi that are closer to it than to other centers

slide-6
SLIDE 6

K-means clustering

Intuition

The aim is to find

◮ cluster centers µ1, .., µK and ◮ a cluster assignment z = (z 1, .., z m) where z i ∈ {1, .., K}

◮ z i is the cluster assigned to example xi

such that µ1, .., µK, z minimize the cost function J(µ1, .., µK, z) =

  • i

xi − µz i2.

slide-7
SLIDE 7

K-means clustering

Cost function

J(µ1, .., µK, z) =

  • i

xi − µz i2

Pseudocode

◮ Pick initial centers µ1, .., µK at random ◮ Repeat until convergence

◮ Optimize z in J(µ1, .., µK, z) keeping µ1, .., µK fixed ◮ Set z i to closest center: z i = arg min k

xi − µk2

◮ Optimize µ1, .., µK in J(µ1, .., µK, z) keeping z fixed ◮ For each k = 1, .., K, set µk =

1 |{i|z i = k}|

  • i:z i =k

xi

slide-8
SLIDE 8

K-Means illustrated

slide-9
SLIDE 9

Limitations of k-Means

K-Means works well if..

◮ Clusters are spherical ◮ Clusters are well separated ◮ Clusters are of similar volumes ◮ Clusters have similar number of points

.. so improve it with more general model

◮ Mixture of Gaussians: ◮ Learn it using Expectation Maximization

slide-10
SLIDE 10

Hierarchical clustering

Output is a dendogram

slide-11
SLIDE 11

Agglomerative hierarchical clustering

Bottom-up

Pseudocode

  • 1. Start with one cluster per example
  • 2. Repeat until all examples in one cluster

◮ merge two closest clusters

(Next example from D. Blei’s course at Princeton)

slide-12
SLIDE 12

Example

  • 20

40 60 80 −20 20 40 60 80

Data

  • D. Blei

Clustering 02 5 / 21

slide-13
SLIDE 13

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 001

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-14
SLIDE 14

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 002

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-15
SLIDE 15

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 003

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-16
SLIDE 16

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 004

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-17
SLIDE 17

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 005

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-18
SLIDE 18

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 006

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-19
SLIDE 19

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 007

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-20
SLIDE 20

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 008

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-21
SLIDE 21

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 009

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-22
SLIDE 22

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 010

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-23
SLIDE 23

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 011

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-24
SLIDE 24

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 012

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-25
SLIDE 25

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 013

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-26
SLIDE 26

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 014

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-27
SLIDE 27

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 015

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-28
SLIDE 28

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 016

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-29
SLIDE 29

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 017

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-30
SLIDE 30

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 018

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-31
SLIDE 31

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 019

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-32
SLIDE 32

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 020

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-33
SLIDE 33

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 021

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-34
SLIDE 34

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 022

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-35
SLIDE 35

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 023

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-36
SLIDE 36

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 024

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-37
SLIDE 37

Agglomerative hierarchical clustering

Bottom-up

Pseudocode

  • 1. Start with one cluster per example
  • 2. Repeat until all examples in one cluster

◮ merge two closest clusters

Defining distance between clusters (i.e. sets of points)

◮ Single Linkage: d(X , Y ) =

min

x∈X ,y∈Y d(x, y)

◮ Complete Linkage: d(X , Y ) =

max

x∈X ,y∈Y d(x, y)

◮ Group Average: d(X , Y ) =

  • x∈X ,y∈Y d(x, y)

|X | × |Y |

◮ Centroid Distance: d(X , Y ) = d( 1

|X |

  • x∈X

x, 1 |Y |

  • y∈Y

y)

slide-38
SLIDE 38

Many, many, many other algorithms available ..

slide-39
SLIDE 39

Clustering with scikit-learn I

K-means: an example with the Iris dataset

slide-40
SLIDE 40

Clustering with scikit-learn II

K-means: an example with the Iris dataset

slide-41
SLIDE 41

Clustering with scikit-learn I

Hierarchical clustering: an example with the Iris dataset

slide-42
SLIDE 42

Dimensionality reduction I

The curse of dimensionality

◮ When dimensionality increases, data becomes increasingly

sparse in the space that it occupies

◮ Definitions of density and distance between points (critical

for many tasks!) become less meaningful

◮ Visualization and qualitative analysis becomes impossible

slide-43
SLIDE 43

Dimensionality reduction II

The curse of dimensionality

And so dimensionality reduction methods..

◮ avoid or at least mitigate the curse of dimensionality ◮ reduce time and memory required ◮ allow data to be more easily visualized ◮ may help eliminate irrelevant features ◮ may reduce noise

slide-44
SLIDE 44

Principal Components Analysis

Find linear projections of original coordinates that maximize variance

slide-45
SLIDE 45

t-SNE: t-distributed stochastic neighbor embedding

A non-linear distance-preserving method

1

1From https://lvdmaaten.github.io/tsne/

slide-46
SLIDE 46

Dimensionality reduction with scikit-learn

slide-47
SLIDE 47

So, we are done for this course

Lots of important things we have left out!

◮ Online and incremental learning; data mining for streams ◮ Important models: Support Vector Machines, Neural Nets

(and Deep learning)

◮ Kernel methods and learning from structured objects ◮ Ensemble methods: random forests, boosting, bagging, etc. ◮ Spatial and temporal learning ◮ Feature selection methods ◮ many many more...

slide-48
SLIDE 48

Para finalizar

Reading assignment

Article by Pedro Domingos: “A few useful things to know about machine learning”

Examen: 17 de diciembre 2018

Será “tipo test”. Consistirá en preguntas rápidas de tipo

  • conceptual. No es necesaria calculadora. Sin apuntes. Si se

necesita alguna fórmula ya se pondrá en el enunciado.