Sathyanarayan Anand & Debasree Banerjee Swarm Intelligence - - PowerPoint PPT Presentation

sathyanarayan anand debasree banerjee swarm intelligence
SMART_READER_LITE
LIVE PREVIEW

Sathyanarayan Anand & Debasree Banerjee Swarm Intelligence - - PowerPoint PPT Presentation

Sathyanarayan Anand & Debasree Banerjee Swarm Intelligence 2005-06 09.02.2006 SI0506 - Data Clustering Using Flocking 1 What is Data Clustering? Given a set of elements and a similarity measure between pairs of elements, to find an


slide-1
SLIDE 1

SI0506 - Data Clustering Using Flocking 1

Sathyanarayan Anand & Debasree Banerjee Swarm Intelligence 2005-06 09.02.2006

slide-2
SLIDE 2

SI0506 - Data Clustering Using Flocking 2

What is Data Clustering?

  • Given a set of elements and a similarity measure

between pairs of elements, to find an algorithm for grouping elements into clusters, so that similar elements end up in the same cluster.

  • Data element = Point in some high-dimensional space.
  • Applications: Geographic information systems, pattern

recognition, medical imaging, marketing analysis, weather forecasting, etc.

slide-3
SLIDE 3

SI0506 - Data Clustering Using Flocking 3

Related Work

  • Hierarchical Algorithms: Break large clusters into smaller
  • nes till desired granularity is reached.

– Chameleon: Model based splitting of clusters.

  • Partitioning Algorithms: Move data between partitions to
  • ptimize some quality measure.

– K-means clustering – Fuzzy c-means clustering

  • Density-Based Algorithms.

– DBSCAN

  • Swarm-Based Algorithms.

– Lumer-Faieta: Ant-colony based clustering.

slide-4
SLIDE 4

SI0506 - Data Clustering Using Flocking 4

Flocking Rules

(As given by Craig Reynolds)

Separation: steer to avoid crowding local flock mates. No two agents land up on the same data point. Alignment: steer towards the average heading

  • f local flock mates.

Cohesion: steer to move toward the average position of local flock mates.

slide-5
SLIDE 5

SI0506 - Data Clustering Using Flocking 5

  • Used to determine if two data points, a and b, belong to

the same cluster or not.

– Euclidean distance: – Vector dot product: – Penalized Difference: abs(a – b).p, where p is a vector that denotes the importance of each attribute. – Pearson’s Coefficient:

The Algorithm – Similarity Measures

slide-6
SLIDE 6

SI0506 - Data Clustering Using Flocking 6

The Algorithm - Procedure

  • Initialize flock randomly on the dataset.
  • Repeat

– Each agent performs local density based clustering

  • If the density of points around a given point, exceeds a given

threshold then every point in the cluster takes the label of the point with the minimum label.

  • Merge clusters belonging to different agents.
  • Flock migrates to new location controlled by defining flock speed.
  • Flock Memory: Location not revisited until all other

locations have been visited.

  • Local clustering leads to the emergence of global cluster

pattern.

slide-7
SLIDE 7

SI0506 - Data Clustering Using Flocking 7

The Algorithm – Proof of Convergence

  • Markov process with state = centroid of flock.

– Centroid = data point that minimizes cumulative distance to all

  • ther points.

– Next state (centroid) depends only on current state.

  • Irreducibility = Any point can be reached from any point.
  • Ergodicity = Time taken to revisit a state is finite and a
  • periodic. Ensured through flock memory.
  • In the limit of infinite time, an irreducible & ergodic

Markov process converges to a stationary distribution.

– Clustering becomes independent of initial state. – Similar proof used in spectral clustering techniques.

slide-8
SLIDE 8

SI0506 - Data Clustering Using Flocking 8

The Algorithm - Limitations

  • Density-based clustering highly susceptible to the radius

and density threshold parameters.

  • Computational cost for creating an efficient data

structure is exponential. Can be reduced using certain techniques.

slide-9
SLIDE 9

SI0506 - Data Clustering Using Flocking 9

Results – Synthetic Dataset

slide-10
SLIDE 10

SI0506 - Data Clustering Using Flocking 10

Results – Zoo Dataset

slide-11
SLIDE 11

SI0506 - Data Clustering Using Flocking 11

Results – Chameleon Dataset 1

slide-12
SLIDE 12

SI0506 - Data Clustering Using Flocking 12

Results – Chameleon Dataset 2

slide-13
SLIDE 13

SI0506 - Data Clustering Using Flocking 13

Results – Performance Comparison

Dataset coverage w.r.t number of agents in the flock.

slide-14
SLIDE 14

SI0506 - Data Clustering Using Flocking 14

References

  • 1. Zaiane O.R., Lee C.H. Clustering Spatial Data in the Presence of Obstacles: A Density-based
  • Approach. IEEE Database Engineering and Applications Symposium, 2002. Proceedings.

International 17-19 July 2002 Page(s):214 -223.

  • 2. E. Lumer, and B. Faieta. Diversity and adaptation in populations of clustering ants. Proceedings, 3rd

international conference on Simulation of adaptive behavior: from animals to animats 3, pages 501-508, 1994.

  • 3. Ester M., Kriegel H.P., Sander J., Xu X.. A Density Based Approach for Discovering Clusters in

Large Spatial Databases with Noise. In 2nd International Conference on Knowledge Discovery Databases and Data Mining (KDD’96), Portland, Oregon. AAAI Press, 1996.

  • 4. Karypis G., Han S., Kumar V. CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic
  • Modeling. In IEEE Computer: Special Issue on Data Analysis and Mining, 1999. Volume 32,

Number 8, Pages 68 - 75.

  • 5. Bradley P.S., Fayyad U., Reina C. Scaling Clustering Algorithms for Large Databases. In 4th

international Conference Knowledge Discovery Databases and Data Mining (KDD’98), New York City, AAAI Press, 1998.

  • 6. F. Höppner. Speeding up Fuzzy c-Means: Using a Hierarchical Data Organization to Control the

Precision of Membership Calculation. Fuzzy Sets and Systems, 128(3), pp. 365-378, 2002.

  • 7. Newman, D.J. & Hettich, S. & Blake, C.L. & Merz, C.J. (1998). UCI Repository of machine learning

databases [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science.