Data-driven Clustering via Parameterized Lloyds Families
Travis Dick Joint work with Maria-Florina Balcan and Colin White Carnegie Mellon University NeurIPS 2018
Data-driven Clustering via Parameterized Lloyds Families Travis - - PowerPoint PPT Presentation
Data-driven Clustering via Parameterized Lloyds Families Travis Dick Joint work with Maria-Florina Balcan and Colin White Carnegie Mellon University NeurIPS 2018 Data-driven Clustering Data-driven Clustering Clustering aims to divide a
Travis Dick Joint work with Maria-Florina Balcan and Colin White Carnegie Mellon University NeurIPS 2018
How do we choose the best algorithm for a specific application?
How do we choose the best algorithm for a specific application? Can we automate this process?
Initialization: Parameter "
Initialization: Parameter "
Initialization: Parameter "
Initialization: Parameter "
'.
Initialization: Parameter "
" = 0: random initialization
'.
Initialization: Parameter "
" = 0: random initialization " = 2: )-means++
'.
Initialization: Parameter "
" = 0: random initialization " = 2: )-means++ " = ∞: farthest first
'.
Initialization: Parameter "
" = 0: random initialization " = 2: )-means++ " = ∞: farthest first Local search: Second parameter $ tweaks the local search. Details in paper.
'.
Initialization: Parameter "
" = 0: random initialization " = 2: )-means++ " = ∞: farthest first Local search: Second parameter $ tweaks the local search. Details in paper. Question: For a distribution . over tasks, what parameters give best performance?
'.
Efficient Tuning on Sample:
Efficient Tuning on Sample:
Efficient Tuning on Sample:
Efficient Tuning on Sample: Generalization Guarantee:
Efficient Tuning on Sample: Generalization Guarantee:
Efficient Tuning on Sample: Generalization Guarantee:
'
( )*+ ,
clustering instances to ensure empirical cost for all parameters within / of expected cost.
Efficient Tuning on Sample: Generalization Guarantee:
'
( )*+ ,
clustering instances to ensure empirical cost for all parameters within / of expected cost.
Efficient Tuning on Sample: Generalization Guarantee: Experiments: Evaluate (", $)-Lloyds family on real and synthetic data.
CIFAR-10 MNIST Mixture of Gaussians CNAE-9
'
( )*+ ,
clustering instances to ensure empirical cost for all parameters within / of expected cost.