Sathyanarayan Anand & Debasree Banerjee Swarm Intelligence - - PowerPoint PPT Presentation

▶

Nov 24, 2023 496 likes •644 views

Sathyanarayan Anand & Debasree Banerjee Swarm Intelligence 2005-06 09.02.2006 SI0506 - Data Clustering Using Flocking 1 What is Data Clustering? Given a set of elements and a similarity measure between pairs of elements, to find an

SLIDE 1

SI0506 - Data Clustering Using Flocking 1

Sathyanarayan Anand & Debasree Banerjee Swarm Intelligence 2005-06 09.02.2006

SLIDE 2

SI0506 - Data Clustering Using Flocking 2

What is Data Clustering?

Given a set of elements and a similarity measure

between pairs of elements, to find an algorithm for grouping elements into clusters, so that similar elements end up in the same cluster.

Data element = Point in some high-dimensional space.
Applications: Geographic information systems, pattern

recognition, medical imaging, marketing analysis, weather forecasting, etc.

SLIDE 3

SI0506 - Data Clustering Using Flocking 3

Related Work

Hierarchical Algorithms: Break large clusters into smaller
nes till desired granularity is reached.

– Chameleon: Model based splitting of clusters.

Partitioning Algorithms: Move data between partitions to
ptimize some quality measure.

– K-means clustering – Fuzzy c-means clustering

Density-Based Algorithms.

– DBSCAN

Swarm-Based Algorithms.

– Lumer-Faieta: Ant-colony based clustering.

SLIDE 4

SI0506 - Data Clustering Using Flocking 4

Flocking Rules

(As given by Craig Reynolds)

Separation: steer to avoid crowding local flock mates. No two agents land up on the same data point. Alignment: steer towards the average heading

f local flock mates.

Cohesion: steer to move toward the average position of local flock mates.

SLIDE 5

SI0506 - Data Clustering Using Flocking 5

Used to determine if two data points, a and b, belong to

the same cluster or not.

– Euclidean distance: – Vector dot product: – Penalized Difference: abs(a – b).p, where p is a vector that denotes the importance of each attribute. – Pearson’s Coefficient:

The Algorithm – Similarity Measures

SLIDE 6

SI0506 - Data Clustering Using Flocking 6

The Algorithm - Procedure

Initialize flock randomly on the dataset.
Repeat

– Each agent performs local density based clustering

If the density of points around a given point, exceeds a given

threshold then every point in the cluster takes the label of the point with the minimum label.

Merge clusters belonging to different agents.
Flock migrates to new location controlled by defining flock speed.
Flock Memory: Location not revisited until all other

locations have been visited.

Local clustering leads to the emergence of global cluster

pattern.

SLIDE 7

SI0506 - Data Clustering Using Flocking 7

The Algorithm – Proof of Convergence

Markov process with state = centroid of flock.

– Centroid = data point that minimizes cumulative distance to all

ther points.

– Next state (centroid) depends only on current state.

Irreducibility = Any point can be reached from any point.
Ergodicity = Time taken to revisit a state is finite and a
periodic. Ensured through flock memory.
In the limit of infinite time, an irreducible & ergodic

Markov process converges to a stationary distribution.

– Clustering becomes independent of initial state. – Similar proof used in spectral clustering techniques.

SLIDE 8

SI0506 - Data Clustering Using Flocking 8

The Algorithm - Limitations

Density-based clustering highly susceptible to the radius

and density threshold parameters.

Computational cost for creating an efficient data

structure is exponential. Can be reduced using certain techniques.

SLIDE 9

SI0506 - Data Clustering Using Flocking 9

Results – Synthetic Dataset

SLIDE 10

SI0506 - Data Clustering Using Flocking 10

Results – Zoo Dataset

SLIDE 11

SI0506 - Data Clustering Using Flocking 11

Results – Chameleon Dataset 1

SLIDE 12

SI0506 - Data Clustering Using Flocking 12

Results – Chameleon Dataset 2

SLIDE 13

SI0506 - Data Clustering Using Flocking 13

Results – Performance Comparison

Dataset coverage w.r.t number of agents in the flock.

SLIDE 14

SI0506 - Data Clustering Using Flocking 14

References

1. Zaiane O.R., Lee C.H. Clustering Spatial Data in the Presence of Obstacles: A Density-based
Approach. IEEE Database Engineering and Applications Symposium, 2002. Proceedings.

International 17-19 July 2002 Page(s):214 -223.

2. E. Lumer, and B. Faieta. Diversity and adaptation in populations of clustering ants. Proceedings, 3rd

international conference on Simulation of adaptive behavior: from animals to animats 3, pages 501-508, 1994.

3. Ester M., Kriegel H.P., Sander J., Xu X.. A Density Based Approach for Discovering Clusters in

Large Spatial Databases with Noise. In 2nd International Conference on Knowledge Discovery Databases and Data Mining (KDD’96), Portland, Oregon. AAAI Press, 1996.

4. Karypis G., Han S., Kumar V. CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic
Modeling. In IEEE Computer: Special Issue on Data Analysis and Mining, 1999. Volume 32,

Number 8, Pages 68 - 75.

5. Bradley P.S., Fayyad U., Reina C. Scaling Clustering Algorithms for Large Databases. In 4th

international Conference Knowledge Discovery Databases and Data Mining (KDD’98), New York City, AAAI Press, 1998.

6. F. Höppner. Speeding up Fuzzy c-Means: Using a Hierarchical Data Organization to Control the

Precision of Membership Calculation. Fuzzy Sets and Systems, 128(3), pp. 365-378, 2002.

7. Newman, D.J. & Hettich, S. & Blake, C.L. & Merz, C.J. (1998). UCI Repository of machine learning

databases [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science.