Optimal interval clustering: Application to Bregman clustering and - PowerPoint PPT Presentation

Optimal interval clustering: Application to Bregman clustering and statistical mixture learning Frank Nielsen 1 Richard Nock 2 1 Sony Computer Science Laboratories/Ecole Polytechnique 2 UAG-CEREGMIA/NICTA May 2014 c � 2014 Frank Nielsen, Sony Computer Science Laboratories, Inc. 1/26

Hard clustering: Partitioning the data set ◮ Partition X = { x 1 , ..., x n } ⊂ X into k clusters C 1 ⊂ X , ..., C k ⊂ X : k � X = C i i =1 ◮ Center-based (prototype) hard clustering: k -means [2], k -medians, k -center, ℓ r -center [10], etc. ◮ Model-based hard clustering: statistical mixtures maximizing the complete likelihood (prototype=model paramter). ◮ k -means: NP-hard when d > 1 and k > 1 [11, 7, 1]. ◮ k -medians and k -centers: NP-hard [12] (1984) ◮ In 1D, k -means is polynomial [3, 15]: O ( n 2 k ). c � 2014 Frank Nielsen, Sony Computer Science Laboratories, Inc. 2/26

Euclidean 1D k -means ◮ 1D k -means [8] has contiguous partition. � n − 1 � ◮ Solved by enumerating all partitions in 1D (1958). k − 1 Better than Stirling numbers of the second kind S ( n , k ) that count all partitions. ◮ Polynomial in time O ( n 2 k ) using Dynamic Programming (DP) [3] (sketched in 1973 in two pages). ◮ R package Ckmeans.1d.dp [15] (2011). c � 2014 Frank Nielsen, Sony Computer Science Laboratories, Inc. 3/26

Interval clustering: Structure Sort X ∈ X with respect to total order < on X in O ( n log n ). Output represented by: ◮ k intervals I i = [ x l i , x r i ] such that C i = I i ∩ X . ◮ or better k − 1 delimiters l i ( i ∈ { 2 , ..., k } ) since r i = l i +1 − 1 ( i < k and r k = n ) and l 1 = 1. [ x 1 ... x l 2 − 1 ] [ x l 2 ... x l 3 − 1 ] ... [ x l k ... x n ] � �� C 1 C 2 C k c � 2014 Frank Nielsen, Sony Computer Science Laboratories, Inc. 4/26

Objective function for interval clustering Scalars x 1 < ... < x n are partitioned contiguously into k clusters: C 1 < ... < C k . Clustering objective function: k � min e k ( X ) = e 1 ( C j ) j =1 c 1 ( · ): intra-cluster cost/energy ⊕ : inter-cluster cost/energy (commutative, associative) n = kp + 1 1D points equally distributed → k different optimal clustering partitions c � 2014 Frank Nielsen, Sony Computer Science Laboratories, Inc. 5/26

Examples of objective functions In arbitrary dimension X = R d : ◮ ℓ r -clustering ( r ≥ 1): � = � ⎛ ⎞ ⎝� d ( x , p ) r e 1 ( C j ) = min ⎠ p ∈ X x ∈C j (argmin=prototype p j is the same whether we take power of 1 r of sum or not) Euclidean ℓ r -clustering: r = 1 median, r = 2 means. ◮ k -center (lim r →∞ ): � = max e 1 ( C i ) = min p ∈ X max x ∈C j d ( x , p ) ◮ Discrete clustering: Search space in min is C j instead of X . Note that in 1D, ℓ s -norm distance is always d ( p , q ) = | p − q | , independent of s ≥ 1. c � 2014 Frank Nielsen, Sony Computer Science Laboratories, Inc. 6/26

Optimal interval clustering by Dynamic Programming X j , i = { x j , ..., x i } ( j ≤ i ) X i = X 1 , i = { x 1 , ..., x i } E = [ e i , j ]: n × k cost matrix, O ( n × k ) memory e i , m = e m ( X i ) Optimality equation: e i , m = min m ≤ j ≤ i { e j − 1 , m − 1 ⊕ e 1 ( X j , i ) } Associative/commutative operator ⊕ (+ or max). Initialize with c i , 1 = c 1 ( X i ) E : compute from left to right column, from bottom to top. Best clustering solution cost is at e n , k . Time: n × k × O ( n ) × T 1 ( n )= O ( n 2 kT 1 ( n )), O(nk) memory c � 2014 Frank Nielsen, Sony Computer Science Laboratories, Inc. 7/26

Retrieving the solution: Backtracking Use an auxiliary matrix S = [ s i . j ] for storing the argmin. Backtrack in O ( k ) time. ◮ Left index l k of C k stored at s n , k : l k = s n , k . ◮ Iteratively retrieve the previous left interval indexes at entries l j − 1 = s l j − 1 , i for j = k − 1 , ..., j = 1. Note that l j − 1 = n − � k l = j n l and l j − 1 = � j − 1 l =1 n l . c � 2014 Frank Nielsen, Sony Computer Science Laboratories, Inc. 8/26

Optimizing time with a Look Up Table (LUT) Save time when computing e 1 ( X j , i ) since we perform n × k × O ( n ) such computations. Look Up Table (LUT): Add extra n × n matrix E 1 with E 1 [ j ][ i ] = e 1 ( X j , i ). Build in O ( n 2 T 1 ( n ))... Then DP in O ( n 2 k )= O ( n 2 T 1 ( n ))). → quadratic amount of memory ( n > 10000...) c � 2014 Frank Nielsen, Sony Computer Science Laboratories, Inc. 9/26

DP solver with cluster size constraints n − and n + i : lower/upper bound constraints on n i = |C i | i � k i ≤ n and � k l =1 = n − l =1 = n + i ≥ n . When no constraints: add dummy constraints n − i = 1 and n + i = n − k + 1. n m = |C m | = i − j + 1 such that n − m ≤ n m ≤ n + m . → j ≤ i + 1 − n − m and j ≥ i + 1 − n + m . e i , m = min { e j − 1 , m − 1 ⊕ e 1 ( X j , i ) } , max { 1+ � m − 1 l =1 n − l , i +1 − n + m }≤ j j ≤ i +1 − n − m c � 2014 Frank Nielsen, Sony Computer Science Laboratories, Inc. 10/26

Model selection from the DP table m ( k ) = e k ( X ) e 1 ( X ) decreases with k and reaches minimum when k = n . Model selection: trade-off choose best model among all the models with k ∈ [1 , n ]. Regularized objective function: e ′ k ( X ) = e k ( X ) + f ( k ), f ( k ) related to model complexity. Compute the DP table for k = n , ..., 1 and avoids redundant computations. Then compute the criterion for the last line (indexed by n ) and choose the argmin of e ′ k . c � 2014 Frank Nielsen, Sony Computer Science Laboratories, Inc. 11/26

A Voronoi cell condition for DP optimality elements → interval clusters → prototypes interval clusters ← prototypes Voronoi diagram: Partition X wrt. P = { p 1 , ..., p k } . Voronoi cell: V ( p j ) = { x ∈ X : d r ( x , p j ) ≤ d r ( x , p l ) ∀ l ∈ { 1 , ..., k }} . x r is a monotonically increasing function on R + , equivalent to V ′ ( p j ) = { x ∈ X : d ( x : p j ) < d ( x : p l ) } DP guarantees optimal clustering when ∀P , V ′ ( p j ) is an interval 2-clustering exhibits the Voronoi bisector. c � 2014 Frank Nielsen, Sony Computer Science Laboratories, Inc. 12/26

1-mean (centroid): O ( n ) time n � ( x i − p ) 2 min p i =1 D ( x , p ) = ( x − p ) 2 , D ′ ( x , p ) = 2( x − p ) , D ′′ ( x , p ) = 2 Convex optimization (existence and unique solution) n n � � D ′ ( x , p ) = 0 ⇒ x i − np = 0 i =1 i =1 � n Center of mass p = 1 i =1 x i − (barycenter) n Extends to Bregman divergence: D F ( x , p ) = F ( x ) − F ( p ) − ( x − p ) F ′ ( p ) c � 2014 Frank Nielsen, Sony Computer Science Laboratories, Inc. 13/26

2-means: O ( n log n ) time Find x l 2 ( n − 1 potential locations for x l : from x 2 to x n ): min { e 1 ( C 1 ) + e 1 ( C 2 ) } x l 2 Browse from left to right l 2 = x 2 , ...., x n . Update cost in constant time E 2 ( l + 1) from E 2 ( l ) (SATs also O (1)): E 2 ( l ) = e 2 ( x 1 ... x l − 1 | x l ... x n ) µ 1 ( l + 1) = ( l − 1) µ 1 ( l ) + x l µ 2 ( l + 1) = ( n − l + 1) µ 2 ( l ) − x l , l n − l l l � � ( x i − µ 1 ( l + 1)) 2 = x 2 i − l µ 2 v 1 ( l + 1) = 1 ( l + 1) i =1 i =1 ∆ E 2 ( l ) = l − 1 � µ 1 ( l ) − x l � 2 + n − l + 1 � µ 2 ( l ) − x l � 2 l n − l c � 2014 Frank Nielsen, Sony Computer Science Laboratories, Inc. 14/26

2-means: Experiments Intel Win7 i7-4800 n Brute force SAT Incremental 300000 155 . 022 0 . 010 0 . 0091 1000000 1814 . 44 0 . 018 0 . 015 Do we need sorting and Ω( n log n ) time? ( k = 1 is linear time) Note that MaxGap does not yield the separator (because centroid is sum of squared distance minimizer) c � 2014 Frank Nielsen, Sony Computer Science Laboratories, Inc. 15/26

Optimal 1D Bregman k -means Bregman information [2] e 1 (generalizes cluster variance): e 1 ( C j ) = min x l ∈C j w l B F ( x l : p j ) . (1) Expressed as [14]: �� ( p j F ′ ( p j ) − F ( p j )) + e 1 ( C j ) = x l ∈C j w l F ( x l ) − x l ∈C j w l �� F ′ ( p j ) x ∈C j w l x process using Summed Area Tables [6] (SATs) S 1 ( j ) = � j l =1 w l , S 2 ( j ) = � j l =1 w l x l , and S 3 ( j ) = � j l =1 w l F ( x l ) in O ( n ) time at preprocessing stage. Evaluate the Bregman information e 1 ( X j , i ) in constant time O (1). For example, � i l = j w l F ( x l ) = S 3 ( i ) − S 3 ( j − 1) with S 3 (0) = 0. Bregman Voronoi diagrams have connected cells [4] thus DP yields optimal interval clustering. c � 2014 Frank Nielsen, Sony Computer Science Laboratories, Inc. 16/26

Exponential families in statistics Family of probability distributions: F = { p F ( x ; θ ) : θ ∈ Θ } Exponential families [13]: p F ( x | θ ) = exp ( t ( x ) θ − F ( θ ) + k ( x )) , For example: univariate Rayleigh R ( σ ), t ( x ) = x 2 , k ( x ) = log x , θ = − 1 2 σ 2 , η = − 1 θ , F ( θ ) = log − 1 2 θ and F ∗ ( η ) = − 1 + log 2 η . c � 2014 Frank Nielsen, Sony Computer Science Laboratories, Inc. 17/26

Uniorder exponential families: MLE Maximum Likelihood Estimator (MLE) [13]: i 1 � e 1 ( X j , i ) = ˆ l ( x j , ..., x i ) = F ∗ (ˆ η j , i ) + k ( x l ) . i − j + 1 l = j � i 1 with ˆ η j , i = l = j t ( x l ). i − j +1 By making a change of variable y l = t ( x l ), and not accounting the � k ( x l ) terms that are constant for any clustering, we get ⎛ ⎞ i 1 � e 1 ( X j , i ) ≡ F ∗ y l ⎝ ⎠ i − j + 1 l = j c � 2014 Frank Nielsen, Sony Computer Science Laboratories, Inc. 18/26

Optimal interval clustering: Application to Bregman clustering and - PowerPoint PPT Presentation

Optimal interval clustering: Application to Bregman clustering and statistical mixture learning Frank Nielsen 1 Richard Nock 2 1 Sony Computer Science Laboratories/Ecole Polytechnique 2 UAG-CEREGMIA/NICTA May 2014 c 2014 Frank Nielsen, Sony

MELODI M achin E L earning, O ptimization, & D ata I nterpretation @ UW Iyer & Bilmes,

Massive Data Algorithmics Lecture 6: Interval Trees Massive Data Algorithmics Lecture 6:

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Bregman and Wasserstein, with Applications to Generative Adversarial Networks (GANs) and beyond

A Bregman near neighbor lower bound via directed isoperimetry Amirali Abdullah Suresh

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Towards More Realistic How Interval Data Is . . . Discussion Interval Models in How to Actually

Interval Computations Interval . . . Linearization and their Possible Use Interval Arithmetic:

Dynamic Programming: Interval Scheduling and Knapsack 6.1 Weighted Interval Scheduling Weighted

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

Understanding Machine Learning with Language and Tensors Jon Rawski Linguistics Department

Module 7 ~ General Outline Purpose of Sathya Sai Centres & Groups The Universal Nature

Interactive proof and zero knowledge protocols Zero-knowledge: definition Probabilistic

Select Index Components & Import Data Manipulating Time Series Data in Python Market

rs stt rtr

Finite-state Strategies in Delay Games Martin Zimmermann Saarland University September 21st,

Observations of IPv6 Addresses David Malone <David.Malone@nuim.ie> Hamilton Institute, NUI

A First Investigation of Sturmian Trees Jean Berstel 2 , Luc Boasson 1 Olivier Carton 1 , Isabelle

Optimal interval clustering: Application to Bregman clustering and - PowerPoint PPT Presentation

Optimal interval clustering: Application to Bregman clustering and statistical mixture learning Frank Nielsen 1 Richard Nock 2 1 Sony Computer Science Laboratories/Ecole Polytechnique 2 UAG-CEREGMIA/NICTA May 2014 c 2014 Frank Nielsen, Sony

MELODI M achin E L earning, O ptimization, &amp; D ata I nterpretation @ UW Iyer &amp; Bilmes,

Massive Data Algorithmics Lecture 6: Interval Trees Massive Data Algorithmics Lecture 6:

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Bregman and Wasserstein, with Applications to Generative Adversarial Networks (GANs) and beyond

A Bregman near neighbor lower bound via directed isoperimetry Amirali Abdullah Suresh

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Towards More Realistic How Interval Data Is . . . Discussion Interval Models in How to Actually

Interval Computations Interval . . . Linearization and their Possible Use Interval Arithmetic:

Dynamic Programming: Interval Scheduling and Knapsack 6.1 Weighted Interval Scheduling Weighted

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

Understanding Machine Learning with Language and Tensors Jon Rawski Linguistics Department

Module 7 ~ General Outline Purpose of Sathya Sai Centres &amp; Groups The Universal Nature

Interactive proof and zero knowledge protocols Zero-knowledge: definition Probabilistic

Select Index Components &amp; Import Data Manipulating Time Series Data in Python Market

rs stt rtr

Finite-state Strategies in Delay Games Martin Zimmermann Saarland University September 21st,

Observations of IPv6 Addresses David Malone &lt;David.Malone@nuim.ie&gt; Hamilton Institute, NUI

A First Investigation of Sturmian Trees Jean Berstel 2 , Luc Boasson 1 Olivier Carton 1 , Isabelle

MELODI M achin E L earning, O ptimization, & D ata I nterpretation @ UW Iyer & Bilmes,

Module 7 ~ General Outline Purpose of Sathya Sai Centres & Groups The Universal Nature

Select Index Components & Import Data Manipulating Time Series Data in Python Market

Observations of IPv6 Addresses David Malone <David.Malone@nuim.ie> Hamilton Institute, NUI