www.data61.csiro.au
k-variates++: more pluses in the k-means++
Richard Nock, Raphaël Canyasse, Roksana Boreli, Frank Nielsen
DATA61 | ANU | TECHNION | ECOLE POLYTECHNIQUE | UNSW | SONY CS LABS, INC.
(formerly NICTA)
Poster #29, Mon. 3-7pm
k -variates++: Poster #29, Mon. 3-7pm more pluses in the k -means++ - - PowerPoint PPT Presentation
(formerly NICTA) k -variates++: Poster #29, Mon. 3-7pm more pluses in the k -means++ Richard Nock , Raphal Canyasse, Roksana Boreli, Frank Nielsen DATA61 | ANU | TECHNION | ECOLE POLYTECHNIQUE | UNSW | SONY CS LABS, INC. www.data61.csiro.au In
www.data61.csiro.au
Richard Nock, Raphaël Canyasse, Roksana Boreli, Frank Nielsen
DATA61 | ANU | TECHNION | ECOLE POLYTECHNIQUE | UNSW | SONY CS LABS, INC.
(formerly NICTA)
Poster #29, Mon. 3-7pm
2
ICML 2016
k-variates
k-variates++: more pluses in the k-means++ | Richard Nock, Raphael Canyasse, Roksana Boreli & Frank Nielsen
❖ A generalization of the popular k-means++ seeding ❖ Two theorems on k-variates++ ❖ guarantees on approximation of the global optimum ❖ likelihood ratio bound between neighbouring instances ❖ Applications: “reductions” between clustering algorithms +
approximation bounds of new clustering algorithms, privacy
❖ A generalization of the popular k-means++ seeding ❖ Two theorems on k-variates++ ❖ guarantees on approximation of the global optimum ❖ likelihood ratio bound between neighbouring instances ❖ Applications: “reductions” between clustering algorithms +
approximation bounds of new clustering algorithms, privacy
3
ICML 2016
k-variates
k-variates++: more pluses in the k-means++ | Richard Nock, Raphael Canyasse, Roksana Boreli & Frank Nielsen
❖ A generalization of the popular k-means++ seeding ❖ Two theorems on k-variates++ ❖ guarantees on approximation of the global optimum ❖ likelihood ratio bound between neighbouring instances ❖ Applications: “reductions” between clustering algorithms +
approximation bounds of new clustering algorithms, privacy
4
ICML 2016
k-variates
k-variates++: more pluses in the k-means++ | Richard Nock, Raphael Canyasse, Roksana Boreli & Frank Nielsen
k-variates++: more pluses in the k-means++ | Richard Nock, Raphael Canyasse, Roksana Boreli & Frank Nielsen 5
❖
k-means++ seeding = a gold standard in clustering:
❖
utterly simple to implement (iteratively pick centers squ. distance to previous centers)
❖
assumption-free (expected) approximation guarantee wrt the k-means global optimum: (Arthur & Vassilvitskii, SODA 2007)
❖ Inspired many variants (tensor clustering,
distributed, data stream, on-line, parallel clustering, clustering without centroids in closed form, etc.)
ICML 2016
k-means++
distributed
streamed no closed form centroid tensors more potentials
6
❖
Approaches are spawns of k-means++:
❖
modify the algorithm (e.g. )
❖
use it as building block
❖
Our objective:
❖
all in the same “bag”: a generalisation of k-means++ from which such approaches would be just “instanciations” reductions
❖
Because general new applications
ICML 2016
distributed
streamed no closed form centroid more potentials
k-variates
more applications
k-means++
k-variates++: more pluses in the k-means++ | Richard Nock, Raphael Canyasse, Roksana Boreli & Frank Nielsen
7
ICML 2016
Input: data A ⇢ Rd with |A| = m, k 2 N⇤; Step 1: Initialise centers C ;; Step 2: for t = 1, 2, ..., k 2.1: randomly sample a ⇠qt A, with q1
.
= um and, for t > 1, qt(a)
.
= Dt(a) X
a02A
Dt(a0) !1 , where Dt(a)
.
= min
x2C ka xk2 2 ;
2.2: x a; 2.3: C C [ {x}; Output: C;
k-variates++: more pluses in the k-means++ | Richard Nock, Raphael Canyasse, Roksana Boreli & Frank Nielsen
Arthur & Vassilvitskii, SODA’07
8
ICML 2016
k-variates
Input: data A ⇢ Rd with |A| = m, k 2 N⇤, random variables {Xa, a 2 A}, probe functions ℘t : A ! Rd (t 1); Step 1: Initialise centers C ;; Step 2: for t = 1, 2, ..., k 2.1: randomly sample a ⇠qt A, with q1
.
= um and, for t > 1, qt(a)
.
= Dt(a) X
a02A
Dt(a0) !1 , where Dt(a)
.
= min
x2C k℘t(a) xk2 2 ;
2.2: randomly sample x ⇠ Xa; 2.3: C C [ {x}; Output: C;
k-variates++: more pluses in the k-means++ | Richard Nock, Raphael Canyasse, Roksana Boreli & Frank Nielsen
9
ICML 2016
k-variates++: more pluses in the k-means++ | Richard Nock, Raphael Canyasse, Roksana Boreli & Frank Nielsen
10
ICML 2016
❖ k-means potential for : , with ❖ Suppose is -stretching: for any optimal cluster with size > 1
and any ,
❖ Then , with
k-variates++: more pluses in the k-means++ | Richard Nock, Raphael Canyasse, Roksana Boreli & Frank Nielsen
c(a)
.
= arg min
c∈C ka ck2 2
φ(A; C)
.
= P
a∈A ka c(a)k2 2
C
φ(A; C) φ(A; {a0}) ≤ (1 + η) · φ(℘t(A); C) φ(℘t(A); {℘t(a0)}), ∀t
EC∼k−variates++[φ(A; C)] ≤ (2 + log k) · Φ
φvar
.
= X
a∈A
tr (cov[Xa]) φbias
.
= X
a∈A
kE[Xa] copt(a)k2
2
φopt
.
= X
a∈A
ka copt(a)k2
2
.
a p p r
i m a t i
g l
a l
t i m u m
(≥ 0)
11
ICML 2016
❖ k-means potential for : , with ❖ Suppose is -stretching: for any optimal cluster with size > 1
and any ,
❖ Then , with
k-variates++: more pluses in the k-means++ | Richard Nock, Raphael Canyasse, Roksana Boreli & Frank Nielsen
c(a)
.
= arg min
c∈C ka ck2 2
φ(A; C)
.
= P
a∈A ka c(a)k2 2
C
φ(A; C) φ(A; {a0}) ≤ (1 + η) · φ(℘t(A); C) φ(℘t(A); {℘t(a0)}), ∀t
EC∼k−variates++[φ(A; C)] ≤ (2 + log k) · Φ
φvar
.
= X
a∈A
tr (cov[Xa]) φbias
.
= X
a∈A
kE[Xa] copt(a)k2
2
φopt
.
= X
a∈A
ka copt(a)k2
2
k-means++:
.
a p p r
i m a t i
g l
a l
t i m u m
(≥ 0)
12
ICML 2016
❖ k-means potential for : , with ❖ Suppose is -stretching: for any optimal cluster with size > 1
and any ,
❖ Then , with
k-variates++: more pluses in the k-means++ | Richard Nock, Raphael Canyasse, Roksana Boreli & Frank Nielsen
c(a)
.
= arg min
c∈C ka ck2 2
φ(A; C)
.
= P
a∈A ka c(a)k2 2
C
φ(A; C) φ(A; {a0}) ≤ (1 + η) · φ(℘t(A); C) φ(℘t(A); {℘t(a0)}), ∀t
EC∼k−variates++[φ(A; C)] ≤ (2 + log k) · Φ
φvar
.
= X
a∈A
tr (cov[Xa]) φbias
.
= X
a∈A
kE[Xa] copt(a)k2
2
φopt
.
= X
a∈A
ka copt(a)k2
2
k-means++:
.
a p p r
i m a t i
g l
a l
t i m u m
(≥ 0)
13
ICML 2016
❖ Guarantee approaches statistical lowerbound
❖ Can be better than Arthur-Vassilvitskii bound, in
k-variates++: more pluses in the k-means++ | Richard Nock, Raphael Canyasse, Roksana Boreli & Frank Nielsen
14
ICML 2016
❖ Reductions from k-variates++ approximability ratios ❖ pick clustering algorithm , ❖ show that expected output of = that of k-variates++
❖ Get approximability ratio for !
k-variates++: more pluses in the k-means++ | Richard Nock, Raphael Canyasse, Roksana Boreli & Frank Nielsen
15
ICML 2016
k-variates++: more pluses in the k-means++ | Richard Nock, Raphael Canyasse, Roksana Boreli & Frank Nielsen
Setting Algorithm Probe functions Densities Batch k-means++ Identity Diracs Distributed d-k-means++ Identity Uniform, support = subsets Distributed p+d-k-means++ Identity Non uniform, compact support Streaming s-k-means++ synopses Diracs On-line
point (batch not hit) Diracs / closest center (batch hit)
16
ICML 2016
k-variates++: more pluses in the k-means++ | Richard Nock, Raphael Canyasse, Roksana Boreli & Frank Nielsen
Setting Algorithm Probe functions Densities Batch k-means++ Identity Diracs Distributed d-k-means++ Identity Uniform, support = subsets Distributed p+d-k-means++ Identity Non uniform, compact support Streaming s-k-means++ synopses Diracs On-line
point (batch not hit) Diracs / closest center (batch hit)
17
ICML 2016
❖ Setting: {data nodes = Forgy nodes} & special node
k-variates++: more pluses in the k-means++ | Richard Nock, Raphael Canyasse, Roksana Boreli & Frank Nielsen
(F1, A1) (F2, A2) (F3, A3) (F4, A4) (F5, A5)
k data points communicated
“Forgy nodes” “Sampling node”
no data communicated
no data non-uniform sampling uniform sampling data
(∪iAi = A)
& &
e.g. hybrid, server-assisted P2P networks (or Forgy node)
18
ICML 2016
❖ Algorithm: iterate for : ❖ chooses (non-uniformly, ) Forgy node, say ❖ samples (uniformly) point , sends to ❖ computes & sends to , which updates ❖ Theorem: , with
❖ Remarks: global optimum on total data; bound gets
k-variates++: more pluses in the k-means++ | Richard Nock, Raphael Canyasse, Roksana Boreli & Frank Nielsen
t
t
Φ
.
= 10φopt + 6φF
s
φF
s .
= P
i∈[n]
P
a∈Ai kc(Ai) ak2 2
19
ICML 2016
❖ Assumption: , all satisfy
k-variates++: more pluses in the k-means++ | Richard Nock, Raphael Canyasse, Roksana Boreli & Frank Nielsen
l i k e l i h
r a t i
n d f
n e i g h b
r s a m p l e s
.
20
ICML 2016
❖ Fix . For any neighbour (differ from 1),
❖ are spread and monotonicity parameters ❖ They can be estimated / computed from data ❖ In general, they with
k-variates++: more pluses in the k-means++ | Richard Nock, Raphael Canyasse, Roksana Boreli & Frank Nielsen
l i k e l i h
r a t i
n d f
n e i g h b
r s a m p l e s
℘t = Id(.)
PC⇠kvariates++[C|A0] PC⇠kvariates++[C|A] ≤ (1 + δw)k−1 + f(k) · δw · (1 + δs)k−1 · %(R)
( f
m a l d e fi n i t i
i n p
t e r / p a p e r )
21
ICML 2016
❖ Fix . For any neighbour (differ from 1),
❖ Conditions for & ?
k-variates++: more pluses in the k-means++ | Richard Nock, Raphael Canyasse, Roksana Boreli & Frank Nielsen
l i k e l i h
r a t i
n d f
n e i g h b
r s a m p l e s
℘t = Id(.)
PC⇠kvariates++[C|A0] PC⇠kvariates++[C|A] ≤ (1 + δw)k−1 + f(k) · δw · (1 + δs)k−1 · %(R)
22
ICML 2016
❖ Fix . For any neighbour (differ from 1),
❖ If densities of all are in , with prob.
as long as
❖
No , in bound (proof exhibits small values whp, experiments display such values). Application in differential privacy (sublinear noise !)
k-variates++: more pluses in the k-means++ | Richard Nock, Raphael Canyasse, Roksana Boreli & Frank Nielsen
l i k e l i h
r a t i
n d f
n e i g h b
r s a m p l e s
℘t = Id(.)
PC⇠kvariates++[C|A0] PC⇠kvariates++[C|A] ≤ (1 + δw)k−1 + f(k) · δw · (1 + δs)k−1 · %(R)
1 4 + 1 d+1 +
2 d
4✏M · √m | {z }
δwδs
23
ICML 2016
❖ k-variates++ ( ) vs k-means++ & k-means
k-variates++: more pluses in the k-means++ | Richard Nock, Raphael Canyasse, Roksana Boreli & Frank Nielsen
k
5 10 15 20 25 0 10 20 30 40 50
P
i |Ai| ≈ 20000
E[|Ai|] = 500
s
φF
s (p)
φF
s (0)
24
ICML 2016
❖ k-variates++ ( ) vs k-means++ & k-means
k-variates++: more pluses in the k-means++ | Richard Nock, Raphael Canyasse, Roksana Boreli & Frank Nielsen
4 5 6 7 8 9 10 k 10 20 30 40 50 p
2 4 6 8
5 6 7 8 9 10 k 10 20 30 40 50 p
2 4 6 8
k
ρφ(H)
.
= φ(d-k-means++)−φ(H)
φ(H)
· 100
ρφ(k-means++)
ρφ(k-meansk)
k-variates++ beats k-means++
25
ICML 2016
❖ We provide a generalisation of k-means++ with
❖ k-variates++ can be used as is (e.g. privacy, k-means++) or
❖ Come see the poster for more examples ❖ Future: use Theorems to address stability, generalisation
k-variates++: more pluses in the k-means++ | Richard Nock, Raphael Canyasse, Roksana Boreli & Frank Nielsen
26
ICML 2016
k-variates++: more pluses in the k-means++ | Richard Nock, Raphael Canyasse, Roksana Boreli & Frank Nielsen
k-variates