Data-driven Clustering via Parameterized Lloyds Families Travis - - PowerPoint PPT Presentation

data driven clustering via parameterized lloyds families
SMART_READER_LITE
LIVE PREVIEW

Data-driven Clustering via Parameterized Lloyds Families Travis - - PowerPoint PPT Presentation

Data-driven Clustering via Parameterized Lloyds Families Travis Dick Joint work with Maria-Florina Balcan and Colin White Carnegie Mellon University NeurIPS 2018 Data-driven Clustering Data-driven Clustering Clustering aims to divide a


slide-1
SLIDE 1

Data-driven Clustering via Parameterized Lloyds Families

Travis Dick Joint work with Maria-Florina Balcan and Colin White Carnegie Mellon University NeurIPS 2018

slide-2
SLIDE 2

Data-driven Clustering

slide-3
SLIDE 3

Data-driven Clustering

  • Clustering aims to divide a dataset into self-similar clusters.
slide-4
SLIDE 4

Data-driven Clustering

  • Clustering aims to divide a dataset into self-similar clusters.
  • Goal: find some unknown natural clustering.
slide-5
SLIDE 5

Data-driven Clustering

  • Clustering aims to divide a dataset into self-similar clusters.
  • Goal: find some unknown natural clustering.
slide-6
SLIDE 6

Data-driven Clustering

  • Clustering aims to divide a dataset into self-similar clusters.
  • Goal: find some unknown natural clustering.
slide-7
SLIDE 7

Data-driven Clustering

  • Clustering aims to divide a dataset into self-similar clusters.
  • However, most clustering algorithms minimize a clustering cost function.
  • Goal: find some unknown natural clustering.
slide-8
SLIDE 8

Data-driven Clustering

  • Clustering aims to divide a dataset into self-similar clusters.
  • However, most clustering algorithms minimize a clustering cost function.
  • Goal: find some unknown natural clustering.
  • Hope that low-cost clusterings recover the natural clusters.
slide-9
SLIDE 9

Data-driven Clustering

  • Clustering aims to divide a dataset into self-similar clusters.
  • However, most clustering algorithms minimize a clustering cost function.
  • Goal: find some unknown natural clustering.
  • Hope that low-cost clusterings recover the natural clusters.
  • There are many algorithms and many objectives.
slide-10
SLIDE 10

Data-driven Clustering

  • Clustering aims to divide a dataset into self-similar clusters.
  • However, most clustering algorithms minimize a clustering cost function.

How do we choose the best algorithm for a specific application?

  • Goal: find some unknown natural clustering.
  • Hope that low-cost clusterings recover the natural clusters.
  • There are many algorithms and many objectives.
slide-11
SLIDE 11

Data-driven Clustering

  • Clustering aims to divide a dataset into self-similar clusters.
  • However, most clustering algorithms minimize a clustering cost function.

How do we choose the best algorithm for a specific application? Can we automate this process?

  • Goal: find some unknown natural clustering.
  • Hope that low-cost clusterings recover the natural clusters.
  • There are many algorithms and many objectives.
slide-12
SLIDE 12

Learning Model

slide-13
SLIDE 13

Learning Model

  • An unknown distribution ! over clustering instances.
slide-14
SLIDE 14

Learning Model

  • An unknown distribution ! over clustering instances.
  • Given a sample "#, … , "& ∼ ! annotated by their target clusterings.
slide-15
SLIDE 15

Learning Model

  • An unknown distribution ! over clustering instances.
  • Find an algorithm " that produces clusterings similar to the target clusterings.
  • Given a sample #$, … , #' ∼ ! annotated by their target clusterings.
slide-16
SLIDE 16

Learning Model

  • An unknown distribution ! over clustering instances.
  • Find an algorithm " that produces clusterings similar to the target clusterings.
  • Given a sample #$, … , #' ∼ ! annotated by their target clusterings.
  • Want " to also work well for new instances from !!
slide-17
SLIDE 17

Learning Model

  • An unknown distribution ! over clustering instances.
  • Find an algorithm " that produces clusterings similar to the target clusterings.
  • Given a sample #$, … , #' ∼ ! annotated by their target clusterings.
  • Want " to also work well for new instances from !!
  • In this work:
  • 1. Introduce large parametric family of clustering algorithms, (*, +)-Lloyds.
slide-18
SLIDE 18

Learning Model

  • An unknown distribution ! over clustering instances.
  • Find an algorithm " that produces clusterings similar to the target clusterings.
  • Given a sample #$, … , #' ∼ ! annotated by their target clusterings.
  • Want " to also work well for new instances from !!
  • In this work:
  • 1. Introduce large parametric family of clustering algorithms, (*, +)-Lloyds.
  • 2. Efficient procedures for finding best parameters on a sample.
slide-19
SLIDE 19

Learning Model

  • An unknown distribution ! over clustering instances.
  • Find an algorithm " that produces clusterings similar to the target clusterings.
  • Given a sample #$, … , #' ∼ ! annotated by their target clusterings.
  • Want " to also work well for new instances from !!
  • In this work:
  • 1. Introduce large parametric family of clustering algorithms, (*, +)-Lloyds.
  • 2. Efficient procedures for finding best parameters on a sample.
  • 3. Generalization: optimal parameters on sample are nearly optimal on !.
slide-20
SLIDE 20

Lloyds Method

slide-21
SLIDE 21

Lloyds Method

  • Maintains ! centers "#, … , "& that define clusters.
slide-22
SLIDE 22

Lloyds Method

  • Maintains ! centers "#, … , "& that define clusters.
  • Perform local search to improve the !-means cost of the centers.
slide-23
SLIDE 23

Lloyds Method

  • Maintains ! centers "#, … , "& that define clusters.
  • Perform local search to improve the !-means cost of the centers.
  • 1. Assign each point to nearest center.
slide-24
SLIDE 24

Lloyds Method

  • Maintains ! centers "#, … , "& that define clusters.
  • Perform local search to improve the !-means cost of the centers.
  • 1. Assign each point to nearest center.
  • 2. Update each center to be the mean of assigned points.
slide-25
SLIDE 25

Lloyds Method

  • Maintains ! centers "#, … , "& that define clusters.
  • Perform local search to improve the !-means cost of the centers.
  • 1. Assign each point to nearest center.
  • 2. Update each center to be the mean of assigned points.
  • 3. Repeat until convergence.
slide-26
SLIDE 26

Lloyds Method

  • Maintains ! centers "#, … , "& that define clusters.
  • Perform local search to improve the !-means cost of the centers.
  • 1. Assign each point to nearest center.
  • 2. Update each center to be the mean of assigned points.
  • 3. Repeat until convergence.
slide-27
SLIDE 27

Lloyds Method

  • Maintains ! centers "#, … , "& that define clusters.
  • Perform local search to improve the !-means cost of the centers.
  • 1. Assign each point to nearest center.
  • 2. Update each center to be the mean of assigned points.
  • 3. Repeat until convergence.
slide-28
SLIDE 28

Lloyds Method

  • Maintains ! centers "#, … , "& that define clusters.
  • Perform local search to improve the !-means cost of the centers.
  • 1. Assign each point to nearest center.
  • 2. Update each center to be the mean of assigned points.
  • 3. Repeat until convergence.
slide-29
SLIDE 29

Lloyds Method

  • Maintains ! centers "#, … , "& that define clusters.
  • Perform local search to improve the !-means cost of the centers.
  • 1. Assign each point to nearest center.
  • 2. Update each center to be the mean of assigned points.
  • 3. Repeat until convergence.
slide-30
SLIDE 30

Lloyds Method

  • Maintains ! centers "#, … , "& that define clusters.
  • Perform local search to improve the !-means cost of the centers.
  • 1. Assign each point to nearest center.
  • 2. Update each center to be the mean of assigned points.
  • 3. Repeat until convergence.
slide-31
SLIDE 31

Initial Centers are Important!

slide-32
SLIDE 32

Initial Centers are Important!

  • Lloyd’s method can get stuck if initial centers are chosen poorly
slide-33
SLIDE 33

Initial Centers are Important!

  • Lloyd’s method can get stuck if initial centers are chosen poorly
  • Initialization is a well-studied problem with many proposed procedures (e.g., !-means++)
slide-34
SLIDE 34

Initial Centers are Important!

  • Lloyd’s method can get stuck if initial centers are chosen poorly
  • Initialization is a well-studied problem with many proposed procedures (e.g., !-means++)
  • Best method will depend on properties of the clustering instances.
slide-35
SLIDE 35

The (", $)-Lloyds Family

slide-36
SLIDE 36

The (", $)-Lloyds Family

Initialization: Parameter "

slide-37
SLIDE 37

The (", $)-Lloyds Family

Initialization: Parameter "

  • Use &'-sampling (generalizing &(-sampling of )-means++)
slide-38
SLIDE 38

The (", $)-Lloyds Family

Initialization: Parameter "

  • Use &'-sampling (generalizing &(-sampling of )-means++)
  • Choose initial centers from dataset * randomly.
slide-39
SLIDE 39

The (", $)-Lloyds Family

Initialization: Parameter "

  • Use &'-sampling (generalizing &(-sampling of )-means++)
  • Choose initial centers from dataset * randomly.
  • Probability that point + ∈ * is center -. is proportional to & +, -/, … , -.1/

'.

slide-40
SLIDE 40

The (", $)-Lloyds Family

Initialization: Parameter "

  • Use &'-sampling (generalizing &(-sampling of )-means++)

" = 0: random initialization

  • Choose initial centers from dataset , randomly.
  • Probability that point - ∈ , is center /0 is proportional to & -, /1, … , /031

'.

slide-41
SLIDE 41

The (", $)-Lloyds Family

Initialization: Parameter "

  • Use &'-sampling (generalizing &(-sampling of )-means++)

" = 0: random initialization " = 2: )-means++

  • Choose initial centers from dataset - randomly.
  • Probability that point . ∈ - is center 01 is proportional to & ., 02, … , 0142

'.

slide-42
SLIDE 42

The (", $)-Lloyds Family

Initialization: Parameter "

  • Use &'-sampling (generalizing &(-sampling of )-means++)

" = 0: random initialization " = 2: )-means++ " = ∞: farthest first

  • Choose initial centers from dataset . randomly.
  • Probability that point / ∈ . is center 12 is proportional to & /, 13, … , 1253

'.

slide-43
SLIDE 43

The (", $)-Lloyds Family

Initialization: Parameter "

  • Use &'-sampling (generalizing &(-sampling of )-means++)

" = 0: random initialization " = 2: )-means++ " = ∞: farthest first Local search: Second parameter $ tweaks the local search. Details in paper.

  • Choose initial centers from dataset . randomly.
  • Probability that point / ∈ . is center 12 is proportional to & /, 13, … , 1253

'.

slide-44
SLIDE 44

The (", $)-Lloyds Family

Initialization: Parameter "

  • Use &'-sampling (generalizing &(-sampling of )-means++)

" = 0: random initialization " = 2: )-means++ " = ∞: farthest first Local search: Second parameter $ tweaks the local search. Details in paper. Question: For a distribution . over tasks, what parameters give best performance?

  • Choose initial centers from dataset / randomly.
  • Probability that point 0 ∈ / is center 23 is proportional to & 0, 24, … , 2364

'.

slide-45
SLIDE 45

Results

slide-46
SLIDE 46

Results

Efficient Tuning on Sample:

slide-47
SLIDE 47

Results

Efficient Tuning on Sample:

  • Efficient algorithm for finding parameters on sample with best agreement to targets.
slide-48
SLIDE 48

Results

Efficient Tuning on Sample:

  • Efficient algorithm for finding parameters on sample with best agreement to targets.
  • “Algorithmically feasible to tune parameters on sample.”
slide-49
SLIDE 49

Results

Efficient Tuning on Sample: Generalization Guarantee:

  • Efficient algorithm for finding parameters on sample with best agreement to targets.
  • “Algorithmically feasible to tune parameters on sample.”
slide-50
SLIDE 50

Results

Efficient Tuning on Sample: Generalization Guarantee:

  • Efficient algorithm for finding parameters on sample with best agreement to targets.
  • “Algorithmically feasible to tune parameters on sample.”
  • Analyze the intrinsic complexity of (", $)-Lloyds
slide-51
SLIDE 51

Results

Efficient Tuning on Sample: Generalization Guarantee:

  • Efficient algorithm for finding parameters on sample with best agreement to targets.
  • “Algorithmically feasible to tune parameters on sample.”
  • Analyze the intrinsic complexity of (", $)-Lloyds
  • Show that need only roughly &

'

( )*+ ,

  • .

clustering instances to ensure empirical cost for all parameters within / of expected cost.

slide-52
SLIDE 52

Results

Efficient Tuning on Sample: Generalization Guarantee:

  • Efficient algorithm for finding parameters on sample with best agreement to targets.
  • “Algorithmically feasible to tune parameters on sample.”
  • Analyze the intrinsic complexity of (", $)-Lloyds
  • Show that need only roughly &

'

( )*+ ,

  • .

clustering instances to ensure empirical cost for all parameters within / of expected cost.

  • “Parameters tuned on the sample will work well for new instances!”
slide-53
SLIDE 53

Results

Efficient Tuning on Sample: Generalization Guarantee: Experiments: Evaluate (", $)-Lloyds family on real and synthetic data.

CIFAR-10 MNIST Mixture of Gaussians CNAE-9

  • Efficient algorithm for finding parameters on sample with best agreement to targets.
  • “Algorithmically feasible to tune parameters on sample.”
  • Analyze the intrinsic complexity of (", $)-Lloyds
  • Show that need only roughly &

'

( )*+ ,

  • .

clustering instances to ensure empirical cost for all parameters within / of expected cost.

  • “Parameters tuned on the sample will work well for new instances!”