k -variates++: Poster #29, Mon. 3-7pm more pluses in the k -means++ - PowerPoint PPT Presentation

In this talk k -variates ❖ A generalization of the popular k -means++ seeding � ❖ Two theorems on k -variates++ � ❖ guarantees on approximation of the global optimum � ❖ likelihood ratio bound between neighbouring instances � ❖ Applications: “ reductions” between clustering algorithms + approximation bounds of new clustering algorithms, privacy 2 k -variates++: more pluses in the k -means++ | Richard Nock , Raphael Canyasse, Roksana Boreli & Frank Nielsen ICML 2016

In this talk k -variates ❖ A generalization of the popular k -means++ seeding � ❖ Two theorems on k -variates++ � ! e r o m d n A ❖ guarantees on approximation of the global optimum � ) r e t s o p e e s ( ❖ likelihood ratio bound between neighbouring instances � ❖ Applications: “ reductions” between clustering algorithms + approximation bounds of new clustering algorithms, privacy 3 k -variates++: more pluses in the k -means++ | Richard Nock , Raphael Canyasse, Roksana Boreli & Frank Nielsen ICML 2016

In this talk k -variates ❖ A generalization of the popular k -means++ seeding � ❖ Two theorems on k -variates++ � ! e r o m d n A ❖ guarantees on approximation of the global optimum � � ) r e t s o p e e s ( ❖ likelihood ratio bound between neighbouring instances � ) ! r e p a p e e s ( ❖ Applications: “ reductions” between clustering algorithms + approximation bounds of new clustering algorithms, privacy 4 k -variates++: more pluses in the k -means++ | Richard Nock , Raphael Canyasse, Roksana Boreli & Frank Nielsen ICML 2016

  Motivation k -means++ seeding = a gold standard in ❖ clustering: � utterly simple to implement (iteratively ❖ pick centers squ. distance to previous ∼ centers) � assumption-free (expected) approximation ❖ guarantee wrt the k -means global optimum :   k -means++ E C [potential] ≤ (2 + log k ) · 8 φ opt distributed on-line (Arthur & Vassilvitskii, SODA 2007) � streamed ❖ Inspired many variants (tensor clustering, distributed, data stream, on-line, parallel no closed form centroid clustering, clustering without centroids in tensors closed form, etc.) more potentials 5 k -variates++: more pluses in the k -means++ | Richard Nock , Raphael Canyasse, Roksana Boreli & Frank Nielsen ICML 2016

Motivation Approaches are spawns of k -means++: � ❖ modify the algorithm (e.g. ) � ❖ ∼ k -variates use it as building block � ❖ Our objective: � ❖ all in the same “bag”: a generalisation of ❖ k -means++ k -means++ from which such approaches distributed would be just “instanciations”   more applications reductions � ⇒ on-line Because general new applications ❖ ⇒ streamed no closed form centroid more potentials 6 k -variates++: more pluses in the k -means++ | Richard Nock , Raphael Canyasse, Roksana Boreli & Frank Nielsen ICML 2016

k -means++ Arthur & Vassilvitskii, SODA’07 Input : data A ⇢ R d with | A | = m , k 2 N ⇤ ; Step 1: Initialise centers C ; ; Step 2: for t = 1 , 2 , ..., k . 2.1: randomly sample a ⇠ q t A , with q 1 = u m and, for t > 1, ! � 1 X . . x 2 C k a � x k 2 D t ( a 0 ) q t ( a ) = D t ( a ) , where D t ( a ) = min 2 ; a 0 2 A 2.2: x a ; 2.3: C C [ { x } ; Output : C ; 7 k -variates++: more pluses in the k -means++ | Richard Nock , Raphael Canyasse, Roksana Boreli & Frank Nielsen ICML 2016

k -variates Input : data A ⇢ R d with | A | = m , k 2 N ⇤ , random variables { X a , a 2 A } , probe functions ℘ t : A ! R d ( t � 1); Step 1: Initialise centers C ; ; Step 2: for t = 1 , 2 , ..., k . 2.1: randomly sample a ⇠ q t A , with q 1 = u m and, for t > 1, ! � 1 X . . x 2 C k ℘ t ( a ) � x k 2 D t ( a 0 ) q t ( a ) = D t ( a ) , where D t ( a ) = min 2 ; a 0 2 A 2.2: randomly sample x ⇠ X a ; 2.3: C C [ { x } ; Output : C ; 8 k -variates++: more pluses in the k -means++ | Richard Nock , Raphael Canyasse, Roksana Boreli & Frank Nielsen ICML 2016

Two theorems & applications 9 k -variates++: more pluses in the k -means++ | Richard Nock , Raphael Canyasse, Roksana Boreli & Frank Nielsen ICML 2016

Theorem 1 l a b o l g f o n o i t a m i x o r p p a m u m t i p o ❖ k -means potential for : , with � c ∈ C k a � c k 2 a ∈ A k a � c ( a ) k 2 C φ ( A ; C ) = P c ( a ) = arg min . . 2 2 ( ≥ 0) ❖ Suppose is -stretching: for any optimal cluster with size > 1 A ℘ t η and any ,   a 0 ∈ A φ ( A ; C ) φ ( ℘ t ( A ); C ) φ ( A ; { a 0 } ) ≤ (1 + η ) · φ ( ℘ t ( A ); { ℘ t ( a 0 ) } ) , ∀ t ❖ Then , with E C ∼ k − variates++ [ φ ( A ; C )] ≤ (2 + log k ) · Φ = (6 + 4 η ) φ opt + 2 φ bias + 2 φ var Φ . X k a � c opt ( a ) k 2 = φ opt . 2 a ∈ A X k E [ X a ] � c opt ( a ) k 2 = φ bias . 2 a ∈ A X . = tr (cov[ X a ]) φ var a ∈ A 10 k -variates++: more pluses in the k -means++ | Richard Nock , Raphael Canyasse, Roksana Boreli & Frank Nielsen ICML 2016

Theorem 1 l a b o l g f o n o i t a m i x o r p p a m u m t i p o ❖ k -means potential for : , with � c ∈ C k a � c k 2 a ∈ A k a � c ( a ) k 2 C φ ( A ; C ) = P c ( a ) = arg min . . 2 2 ( ≥ 0) ❖ Suppose is -stretching: for any optimal cluster with size > 1 A ℘ t η and any ,   a 0 ∈ A φ ( A ; C ) φ ( ℘ t ( A ); C ) φ ( A ; { a 0 } ) ≤ (1 + η ) · φ ( ℘ t ( A ); { ℘ t ( a 0 ) } ) , ∀ t ❖ Then , with E C ∼ k − variates++ [ φ ( A ; C )] ≤ (2 + log k ) · Φ = (6 + 4 η ) φ opt + 2 φ bias + 2 φ var Φ . k- means++: � X k a � c opt ( a ) k 2 = φ opt . 2 • probe = Id � a ∈ A • = Diracs X k E [ X a ] � c opt ( a ) k 2 = φ bias . X . 2 a ∈ A X . = tr (cov[ X a ]) φ var a ∈ A 11 k -variates++: more pluses in the k -means++ | Richard Nock , Raphael Canyasse, Roksana Boreli & Frank Nielsen ICML 2016

Theorem 1 l a b o l g f o n o i t a m i x o r p p a m u m t i p o ❖ k -means potential for : , with � c ∈ C k a � c k 2 a ∈ A k a � c ( a ) k 2 C φ ( A ; C ) = P c ( a ) = arg min . . 2 2 ( ≥ 0) ❖ Suppose is -stretching: for any optimal cluster with size > 1 A ℘ t η and any ,   a 0 ∈ A φ ( A ; C ) φ ( ℘ t ( A ); C ) φ ( A ; { a 0 } ) ≤ (1 + η ) · φ ( ℘ t ( A ); { ℘ t ( a 0 ) } ) , ∀ t ❖ Then , with E C ∼ k − variates++ [ φ ( A ; C )] ≤ (2 + log k ) · Φ = (6 + 4 η ) φ opt + 2 φ bias + 2 φ var Φ . k- means++: X k a � c opt ( a ) k 2 = φ opt . 2 φ bias = φ opt a ∈ A φ var = 0 X k E [ X a ] � c opt ( a ) k 2 = φ bias . 2 0 = a ∈ A η X . φ opt = tr (cov[ X a ]) φ var 8 = Φ ⇒ a ∈ A 12 k -variates++: more pluses in the k -means++ | Richard Nock , Raphael Canyasse, Roksana Boreli & Frank Nielsen ICML 2016

      Remarks ❖ Guarantee approaches statistical lowerbound   (Fréchet-Cramér-Rao-Darmois) � ❖ Can be better than Arthur-Vassilvitskii bound, in particular if   φ bias < φ opt φ bias = knob from which background / domain knowledge may improve the general bound   13 k -variates++: more pluses in the k -means++ | Richard Nock , Raphael Canyasse, Roksana Boreli & Frank Nielsen ICML 2016

Applications ❖ Reductions from k -variates++ approximability ratios � ⇒ ❖ pick clustering algorithm , � L ❖ show that expected output of = that of k -variates++ L for particular choices of and   X . ℘ t (note: no computational constraint, just need existence) � ❖ Get approximability ratio for !   L 14 k -variates++: more pluses in the k -means++ | Richard Nock , Raphael Canyasse, Roksana Boreli & Frank Nielsen ICML 2016

Summary (poster, paper) X . Setting Algorithm L Probe functions Densities ℘ t Batch k -means++ Identity Diracs Distributed d - k -means++ Identity Uniform, support = subsets p + d - k -means++ Distributed Identity Non uniform, compact support Streaming s - k -means++ synopses Diracs On-line ol - k -means++ point (batch not hit) Diracs / closest center (batch hit) 15 k -variates++: more pluses in the k -means++ | Richard Nock , Raphael Canyasse, Roksana Boreli & Frank Nielsen ICML 2016

Summary (poster, paper) X . Setting Algorithm L Probe functions Densities ℘ t Batch k -means++ Identity Diracs Distributed d - k -means++ Identity Uniform, support = subsets p + d - k -means++ Distributed Identity Non uniform, compact support Streaming s - k -means++ synopses Diracs On-line ol - k -means++ point (batch not hit) Diracs / closest center (batch hit) 16 k -variates++: more pluses in the k -means++ | Richard Nock , Raphael Canyasse, Roksana Boreli & Frank Nielsen ICML 2016

k -variates++: Poster #29, Mon. 3-7pm more pluses in the k -means++ - PowerPoint PPT Presentation

(formerly NICTA) k -variates++: Poster #29, Mon. 3-7pm more pluses in the k -means++ Richard Nock , Raphal Canyasse, Roksana Boreli, Frank Nielsen DATA61 | ANU | TECHNION | ECOLE POLYTECHNIQUE | UNSW | SONY CS LABS, INC. www.data61.csiro.au In

Efficiency-Improvement Techniques Overview Reading: Ch. 11 in Law & Ch. 10 in Handbook of

Hypotheses with two variates Two sample hypotheses R.W. Oldford Common hypotheses Recall some

Hypotheses with two variates Paired data R.W. Oldford Common hypotheses Recall some common

Why Bayesian methods in Simulation? Simulation Simulation Model Inputs BAYESIAN IDEAS

Random Variate Generation R.B. Lenin (rblenin@daiict.ac.in) Autumn 2007 R.B. Lenin

Proceedings of the 2004 Winter Simulation Conference R. G. Ingalls, M. D. Rossetti, J. S. Smith,

Advance Stochastic Gradient with Variance Reduction Jingchang Liu December 7, 2017 University

Categorical data Modelling and Independence R.W. Oldford Eikosograms - Dependence/independence

Hastings ratio = P ( proposing ) P ( proposing ) = g ( u ) g ( u )

Managing random generator seeds with SeedService Gianluca Petrillo University of

T1M1: Management Plane Security Standard (T1.276) Presentation Contributors and Liaison

policies Security in Organizations 2011 Eric Verheul 1 Literature Main literature for this

Security Policies Security Policies for Large Who is allowed to do what when Systems

EX/7-2: Impurity Seeding on JET to Achieve Power Plant like Divertor Conditions M. Wischmeier

Search Based Test Data Generation for Server-side Web Application Testing Nadia Alshahwan and

Multi-objective evolutionary optimization of computation-intensive simulations The case of

The FERMI@Elettra Project The FERMI@Elettra Project John Adams Institute for Accelerator Science

ECE590-03 Enterprise Storage Architecture Fall 2017 Hard disks, SSDs, and the I/O subsystem

CPSC 410/611: Disk Management Disk Structure Disk Scheduling RAID

Data Systems on Modern Hardware: Multi-cores, Solid-State Drives, and Non-Volatile Memories Prof.

Concurrency CS 442: Mobile App Development Michael Saelee <lee@iit.edu> Computer Science

Set 6: Web Development Toolkits Why Use a Toolkit? Choices jQuery www.jQuery.com

Digital Design Disc: RTL Combinatorial Components 2-to-4 Decoder 4-to-16 Decoder 8-bit Shifter

Dr. Strange- Todd L. Montgomery @toddlmontgomery Haskell Erlang Haskell Clojure

k -variates++: Poster #29, Mon. 3-7pm more pluses in the k -means++ - PowerPoint PPT Presentation

(formerly NICTA) k -variates++: Poster #29, Mon. 3-7pm more pluses in the k -means++ Richard Nock , Raphal Canyasse, Roksana Boreli, Frank Nielsen DATA61 | ANU | TECHNION | ECOLE POLYTECHNIQUE | UNSW | SONY CS LABS, INC. www.data61.csiro.au In

Efficiency-Improvement Techniques Overview Reading: Ch. 11 in Law &amp; Ch. 10 in Handbook of

Hypotheses with two variates Two sample hypotheses R.W. Oldford Common hypotheses Recall some

Hypotheses with two variates Paired data R.W. Oldford Common hypotheses Recall some common

Why Bayesian methods in Simulation? Simulation Simulation Model Inputs BAYESIAN IDEAS

Random Variate Generation R.B. Lenin (rblenin@daiict.ac.in) Autumn 2007 R.B. Lenin

Proceedings of the 2004 Winter Simulation Conference R. G. Ingalls, M. D. Rossetti, J. S. Smith,

Advance Stochastic Gradient with Variance Reduction Jingchang Liu December 7, 2017 University

Categorical data Modelling and Independence R.W. Oldford Eikosograms - Dependence/independence

Hastings ratio = P ( proposing ) P ( proposing ) = g ( u ) g ( u )

Managing random generator seeds with SeedService Gianluca Petrillo University of

T1M1: Management Plane Security Standard (T1.276) Presentation Contributors and Liaison

policies Security in Organizations 2011 Eric Verheul 1 Literature Main literature for this

Security Policies Security Policies for Large Who is allowed to do what when Systems

EX/7-2: Impurity Seeding on JET to Achieve Power Plant like Divertor Conditions M. Wischmeier

Search Based Test Data Generation for Server-side Web Application Testing Nadia Alshahwan and

Multi-objective evolutionary optimization of computation-intensive simulations The case of

The FERMI@Elettra Project The FERMI@Elettra Project John Adams Institute for Accelerator Science

ECE590-03 Enterprise Storage Architecture Fall 2017 Hard disks, SSDs, and the I/O subsystem

CPSC 410/611: Disk Management Disk Structure Disk Scheduling RAID

Data Systems on Modern Hardware: Multi-cores, Solid-State Drives, and Non-Volatile Memories Prof.

Concurrency CS 442: Mobile App Development Michael Saelee &lt;lee@iit.edu&gt; Computer Science

Set 6: Web Development Toolkits Why Use a Toolkit? Choices jQuery www.jQuery.com

Digital Design Disc: RTL Combinatorial Components 2-to-4 Decoder 4-to-16 Decoder 8-bit Shifter

Dr. Strange- Todd L. Montgomery @toddlmontgomery Haskell Erlang Haskell Clojure

Efficiency-Improvement Techniques Overview Reading: Ch. 11 in Law & Ch. 10 in Handbook of

Concurrency CS 442: Mobile App Development Michael Saelee <lee@iit.edu> Computer Science