contributions to large scale data clustering and
play

Contributions to Large Scale Data Clustering and Streaming with - PowerPoint PPT Presentation

Contributions to Large Scale Data Clustering and Streaming with Affinity Propagation. Application to Autonomic Grids. Xiangliang Zhang Direction de th` ese : Mich` ele Sebag et Cecile Germain-Renaud TAO LRI, INRIA, CNRS Universit e de


  1. Contributions to Large Scale Data Clustering and Streaming with Affinity Propagation. Application to Autonomic Grids. Xiangliang Zhang Direction de th` ese : Mich` ele Sebag et Cecile Germain-Renaud TAO − LRI, INRIA, CNRS Universit´ e de Paris-Sud July 28, 2010 1/53

  2. Motivations: Autonomic Computing Major part of the cost: management 2/53

  3. Goals of Autonomic Computing AUTONOMIC VISION & MANIFESTO http://www.research.ibm.com/autonomic/manifesto/ Self-managing system with the ability of ◮ Self-healing: detect, diagnose and repair problems ◮ Self-configuring: automatically incorporate and configure components ◮ Self-optimizing: ensure the optimal functioning w.r.t defined requirements ◮ Self-protecting: anticipate and defend against security breaches How: ◮ pre-requisite is to have a model of the system behavior ◮ there is no model based on first principles Machine Learning and Data Mining for Autonomic Computing [Rish et al., 2005] 3/53

  4. Autonomic Grid Computing System EGEE: Enabling Grids for E-sciencE, http://www.eu-egee.org Infrastructure project, DataGrid(2002-2004), EGEE-I(2004-2006), EGEE-II(2006-2008), EGEE-III(2008-2010) and EGI(2010-2013) 4/53

  5. Summarizing a dataset ◮ Clustering : grouping similar points in the same group (cluster) ◮ Extracting Exemplars: real objects from dataset better suited to complex application domains (e.g., molecules, structured items) ∗ is the averaged center; o is the exemplar 4 3 2 1 0 −1 −2 −3 −4 −5 −5 −4 −3 −2 −1 0 1 2 3 4 5 5/53

  6. Position of the problem Given: Data : E = { x 1 , x 2 , ..., x N } Distance: d ( x i , x j ) Define: Exemplars : { e i } is a subset of E Distortion : D ( { e i } ) = � N e i ( d 2 ( x i , e i ) ) i =1 min Goal: Find a mapping σ , x i → σ ( x i ) ∈ { e i } minimizing the distortion NB : Combinatorial optimization problem (NP). 6/53

  7. Streaming: extracting exemplars in real-time Job stream : jobs submitted by the grid users at 24 ∗ 7, more than 200 jobs/min How to make a summary of the job stream ? Features Requirements streaming of jobs actual jobs as exemplars for traceability arriving fast real-time processing user-visible model available at any time non-stationary distribution change detection 7/53

  8. Contents ◮ Motivations ◮ Clustering : The State of the Art Large-scale Data Clustering ◮ Streaming : Data streams Clustering ◮ Application to Autonomic Computing: A Multi-scale Real-time Grid Monitoring System ◮ Conclusions and Perspectives 8/53

  9. Clustering: The State of the Art 4 3 2 1 0 −1 −2 −3 −4 −5 −5 −4 −3 −2 −1 0 1 2 3 4 5 ◮ Averaged centers : [Bradley et al., 1997] k -means, minimizing the sum-squared distance from a point to its center k -medians, minimizing the sum of distance from a point to its center k -centers, minimizing the maximum distance from a point to its center ◮ Exemplars : [Kaufman and Rousseeuw, 1987] minimizing the sum-squared distance from a point to its exemplar k -medoids, [Kaufman and Rousseeuw, 1990, Ng and Han, 1994] Affinity Propagation [Frey and Dueck, 2007] 9/53

  10. List of main algorithms of clustering ◮ Partitioning methods : k -means, k -medians, k -centers, k -medoids ◮ Hierarchical methods : linkages-based clustering (AHC) BIRCH: Balanced Iterative Reducing and Clustering using Hierarchies [Zhang et al., 1996] CURE: Clustering Using REpresentatives [Guha et al., 1998] ROCK: RObust Clustering using linKs [Guha et al., 1999] CHAMELEON: dynamic model to measure similarity of clusters [Karypis et al., 1999] ◮ Arbitrarily shaped clusters: DbScan: Density-based clustering [Ester, 1996] OPTICS: Ordering Points To Identify the Clustering Structure [Ankerst et al., 1999] ◮ Model-based methods : Naive-Bayes model [Meila and Heckerman, 2001] Mixture of Gaussian models [Banfield and Raftery, 1993] Neural network (SOM, Self-Organizing Map) [Kohonen, 1981] ◮ Spectral clustering methods [Ng et al., 2001] a recent method based on algebraic process of squared distance matrix 10/53

  11. Clustering vs Classification NIPS 2005,2009 workshop on Theoretical Foundations of Clustering Shai Ben-David, Ulrike von Luxburg, John Shawe-Taylor, Naftali Tishby Classification Clustering K classes (given) clusters (unknown) Quality Generalization error many cost functions Focus on Test set Training set Goal Prediction Interpretation Analysis discriminant exploratory Field mature new 11/53

  12. Open questions of clustering ◮ The number of clusters k -means, k -median, k -center, k -medoids set by user Model-based method determined by user Affinity Propagation indirectly set by user ◮ Optimality w.r.t. distortion ◮ Generalization property: stability w.r.t. the data sample/distribution 12/53

  13. Open questions of clustering ◮ The number of clusters k -means, k -median, k -center, k -medoids set by user Model-based method determined by user Affinity Propagation indirectly set by user ◮ Optimality w.r.t. distortion ◮ Generalization property: stability w.r.t. the data sample/distribution Affinity Propagation (AP) [Frey and Dueck, 2007] 12/53

  14. Iterations of Message passing in AP 13/53

  15. Iterations of Message passing in AP 13/53

  16. Iterations of Message passing in AP 13/53

  17. Iterations of Message passing in AP 13/53

  18. Iterations of Message passing in AP 13/53

  19. Iterations of Message passing in AP 13/53

  20. Iterations of Message passing in AP 13/53

  21. Iterations of Message passing in AP 13/53

  22. The AP framework input: Data: x 1 , x 2 , ..., x N Distance: d ( x i , x j ) find: σ : x i → σ ( x i ), exemplar representing x i , such that argmax � N i =1 S ( x i , σ ( x i )) where, S ( x i , x j ) = − d 2 ( x i , x j ) if i � = j s ∗ > = 0: user-defined parameter S ( x i , x i ) = − s ∗ ◮ s ∗ = ∞ , only one exemplar (one cluster) ◮ s ∗ = 0, every point is an exemplar (N clusters) 14/53

  23. AP: a message passing algorithm 15/53

  24. Message passed r ( i , k ) = S ( x i , x k ) − max k ′ , k ′ � = k { a ( i , k ′ ) + S ( x i , x ′ k ) } r ( k , k ) = S ( x k , x k ) − max k ′ , k ′ � = k { S ( x k , x ′ k ) } a ( i , k ) = min { 0 , r ( k , k ) + � i ′ , i ′ � = i , k max { 0 , r ( i ′ , k ) }} a ( k , k ) = � i ′ , i ′ � = k max { 0 , r ( i ′ , k ) } The index of exemplar σ ( x i ) associated to x i is finally defined as: σ ( x i ) = argmax { r ( i , k ) + a ( i , k ) , k = 1 . . . N }

  25. Summary of AP Affinity Propagation (AP) ◮ An exemplar-based clustering method ◮ A message passing algorithm (belief propagation) ◮ Parameterized by s ∗ (not by K) Computational complexity ◮ Similarity computation: O ( N 2 ) ◮ Message passing: O ( N 2 log N ) 17/53

  26. Contents ◮ Motivations ◮ Clustering : The State of the Art Large-scale Data Clustering ◮ Streaming : Data streams Clustering ◮ Application to Autonomic Computing: A Multi-scale Real-time Grid Monitoring System ◮ Conclusions and Perspectives 18/53

  27. Hierarchical AP Divide-and-conquer (inspired by [Nittel et al., 2004] ) 19/53

  28. Hierarchical AP Divide-and-conquer (inspired by [Nittel et al., 2004] ) 19/53

  29. Weighted AP AP WAP x i x i , n i S ( x i , x j ) − → n i × S ( x i , x j ) price for x i to select x j as an exemplar S ( x i , x i ) − → S ( x i , x i ) + ( n i − 1) × ǫ price to select x i as exemplar ǫ is variance of n i points Theorem AP ( x 1 , ..., x 1 , ... ) == WAP (( x 1 , n 1 ) , ( x 2 , n 2 ) , ... ) , x 2 , ..., x 2 � �� � � �� � n 1 copies n 2 copies 20/53

  30. Hi-AP : Hierarchical AP ◮ Complexity of Hi-AP is O ( N 3 / 2 ) [Zhang et al., 2008] 21/53

  31. Hi-AP : Hierarchical AP 22/53

  32. Complexity of Hi-AP Theorem h +2 h +1 ) Hi-AP reduces the complexity to O ( N [Zhang et al., 2009] K : number of exemplars to be clustered on average 1 b = ( N h +1 : K ) branching factor K 2 � N 2 � h +1 : complexity on each branching K � h i =0 b i = b h +1 − 1 : total number of branching b − 1 Therefore: total computational complexity: N K − 1 C ( h ) = K 2 � N N ≫ K K 2 � N 2 � h +2 � h +1 . ≈ h +1 � N 1 K � K h +1 − 1 K Particular cases, C (0) = N 2 and C (1) ∝ N 3 / 2 23/53

  33. Study of the distortion loss ◮ real center of data distribution N ( µ, σ 2 ): µ ◮ empirical center of n data samples: ˆ µ n ◮ distance distribution 1 N (0 , σ 2 + σ 2 x i − ˆ µ n ∼ n ) 0.8 0.6 ◮ selected center (exemplar) : ¯ averaged µ n (closest to center µ n ) ˆ 0.4 seleceted center 0.2 ◮ distance distribution 0 | ¯ µ n − ˆ µ n | = min ( | x i − ˆ µ n | ) −0.2 ∼ Weibull distribution (Type III extreme −0.2 0 0.2 0.4 0.6 0.8 1 1.2 value distribution) 24/53

  34. Weibull distribution (Type III extreme value distribution) 1.8 k= −1.5 k= −1.2 1.6 k= −0.9 k= −0.6 1.4 k= −0.3 1.2 1 0.8 0.6 0.4 0.2 0 −4 −3 −2 −1 0 1 2 3 4 where k is the shape parameter. 25/53

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend