A clustering-based visualization of colocation patterns Elise Desmier - - PowerPoint PPT Presentation

a clustering based visualization of colocation patterns
SMART_READER_LITE
LIVE PREVIEW

A clustering-based visualization of colocation patterns Elise Desmier - - PowerPoint PPT Presentation

A clustering-based visualization of colocation patterns Elise Desmier 1 , Frdric Flouvat 2 , Dominique Gay 3 and Nazha Selmaoui-Folcher 2 1 Universit de Lyon, LIRIS , UMR5205 CNRS, Villeurbanne, France elise.desmier@liris.cnrs.fr 2 University


slide-1
SLIDE 1

A clustering-based visualization of colocation patterns

Elise Desmier 1, Frédéric Flouvat2, Dominique Gay 3 and Nazha Selmaoui-Folcher 2

1 Université de Lyon, LIRIS, UMR5205 CNRS, Villeurbanne, France

elise.desmier@liris.cnrs.fr

2 University of New Caledonia, PPME, EA3325, Nouméa, New Caledonia

frederic.flouvat@univ-nc.nc nazha.selmaoui@univ-nc.nc

3 TECH/ASAP/PROF, Orange Labs, Lannion, France

dominique.gay@orange-ftgroup.com

IDEAS’11, Lisboa

slide-2
SLIDE 2

Context Spatial pattern mining and visualization Visualization of colocations Application Conclusion

Toward a better visualization of spatial patterns

One of the major issues in data mining (Han and Kamber 06)

"the presentation and visualization of discovered knowledge expressed in high-level languages, visual representations, or other expressive forms so that the knowledge can be easily understood and directly usable by humans"

Problem with existing solutions

No solutions to display spatial patterns (colocations) in a simple, concise and intuitive way for experts

Contribution

A new visualization of colocations based on a heuristic clustering method easily usable and interpretable by domain experts additional spatial and thematic informations wrt "classical" colocations

Frédéric Flouvat A clustering-based visualization of colocations 2 / 36

slide-3
SLIDE 3

Context Spatial pattern mining and visualization Visualization of colocations Application Conclusion

Outline

1 Context 2 Spatial pattern mining and visualization 3 Visualization of colocations 4 Application 5 Conclusion

Frédéric Flouvat A clustering-based visualization of colocations 3 / 36

slide-4
SLIDE 4

Context Spatial pattern mining and visualization Visualization of colocations Application Conclusion

Application Context

New Caledonia

Exceptional biodiversity and caledonian lagoons declared a World Heritage site by the UNESCO But important mining projects (25% of world resources in Nickel), a tropical climate with cyclones and bush fires

Important soil erosion

Strong impact on terrestrial and littoral ecosystems

➫ FO.S.T.ER. project (financed by the French government)

A multidisciplinary consortium composed of specialists in data mining, image processing and geology Providing to geologists a semi-automatic and complete process for monitoring soil erosion

Frédéric Flouvat A clustering-based visualization of colocations 4 / 36

slide-5
SLIDE 5

Context Spatial pattern mining and visualization Visualization of colocations Application Conclusion

Data

Complex data

Heterogenous data : DEM, vegetation, soils

  • ccupation , climate, ...

Large and spatial data ➫ Need of advanced analysis and modelization methods to assist experts

Spatial data mining

Extracting interesting useful and unexpected knowledge in spatial data A large number of descriptive and/or predictive methods

  • e.g. spatial decision trees, clustering, spatial pattern mining ...

Focus on colocations (spatial patterns)

Frédéric Flouvat A clustering-based visualization of colocations 5 / 36

slide-6
SLIDE 6

Context Spatial pattern mining and visualization Visualization of colocations Application Conclusion

Outline

1 Context 2 Spatial pattern mining and visualization 3 Visualization of colocations 4 Application 5 Conclusion

Frédéric Flouvat A clustering-based visualization of colocations 6 / 36

slide-7
SLIDE 7

Context Spatial pattern mining and visualization Visualization of colocations Application Conclusion

What is a colocation ?

First, the data

Spatial objects associated to different features

  • e.g. object 1 is characterized as

"sparse vegetation" (A), object 7 as "mine" (C), and object 8 as "river erosion" (B) ➫ A1, C7 and B8

Then, the pattern

Colocation = subset of features whose objects are "often" located close to each other

  • e.g. {A, C, B}, i.e. {sparse vegetation, mine, river erosion}

Colocation instance = subset of objects having the features of the colocation and close to each other

  • set of all instances of a colocation = table instance TI

Frédéric Flouvat A clustering-based visualization of colocations 7 / 36

slide-8
SLIDE 8

Context Spatial pattern mining and visualization Visualization of colocations Application Conclusion

Mining colocations(Shekhar et al. 01)

Two important aspects

The neighborhood relationship

  • e.g. euclidean distance, intersection, ...

The measure "often located close to each other"

  • participation index (anti-monotone)

Mining

Input : a set of spatial objects each one associated to a feature, a neighborhood relationship, and a threshold for the measure

  • data stored in a GIS

Output : "frequent" colocations, i.e. those whose participation index is greater than a threshold Algorithm : classical levelwise mining algorithm

  • such as Apriori for itemset mining

Frédéric Flouvat A clustering-based visualization of colocations 8 / 36

slide-9
SLIDE 9

Context Spatial pattern mining and visualization Visualization of colocations Application Conclusion

Methods unsuited to expert needs

Many works on colocations

Improving algorithms performance Extracting local patterns Reducing the number of colocations ...

Problems

No visualization of colocations adapted to expert needs and practices

  • necessary to extract relevant informations

Frédéric Flouvat A clustering-based visualization of colocations 9 / 36

slide-10
SLIDE 10

Context Spatial pattern mining and visualization Visualization of colocations Application Conclusion

Visualizing data mining results

Three main approaches to visualize data mining results :

  • 1. Textual representation
  • basically a list of patterns with interestingness measures
  • ex. : textual visualization of colocation patterns

➫ simple but not easily understandable by domain experts

Frédéric Flouvat A clustering-based visualization of colocations 10 / 36

slide-11
SLIDE 11

Context Spatial pattern mining and visualization Visualization of colocations Application Conclusion

Visualizing data mining results

Three main approaches to visualize data mining results :

  • 2. Abstract representation (e.g. plots, matrices, graphs, trees or cubes)
  • condense and informative visual representations
  • f the solutions with statistics
  • ex. : grid representation of association rules in

MineSet (Brunk et al. 97)

  • ex. : radial hierarchical layout to represent

frequent itemsets (Keim et al. 05)

  • ex. : orthogonal graphs to represent frequent

itemsets (Leung et al. 08)

➫ not really adapted to spatial patterns

  • in spatial pattern mining, spatiality is not just

an other dimension of analysis

  • for domain experts, the spatial dimension is

the basis of their interpretation

Frédéric Flouvat A clustering-based visualization of colocations 11 / 36

slide-12
SLIDE 12

Context Spatial pattern mining and visualization Visualization of colocations Application Conclusion

Visualizing data mining results

Three main approaches to visualize data mining results :

  • 3. Cartographic representation
  • first solution : visualization of spatial pattern

instances on a map

  • ex. : classical cartographic visualization of

spatial clusters with colors

  • ex. : select an association rule and visualize

its interestingness measure for each country (Andrienko 99)

➫ not possible to display all colocations instances (such as in spatial cluster analysis) ➫ "select a pattern and display its instances" gives only a local view of one pattern

Frédéric Flouvat A clustering-based visualization of colocations 12 / 36

slide-13
SLIDE 13

Context Spatial pattern mining and visualization Visualization of colocations Application Conclusion

Visualizing data mining results

Three main approaches to visualize data mining results :

  • 3. Cartographic representation
  • second solution : generating visual representations of the

solutions

  • ex. : clusters of trajectories summarized by "representative

trajectories" using a classifier and visual refinement (Andrienko 09)

➫ not directly usable for colocation patterns but an interesting approach

Frédéric Flouvat A clustering-based visualization of colocations 13 / 36

slide-14
SLIDE 14

Context Spatial pattern mining and visualization Visualization of colocations Application Conclusion

Outline

1 Context 2 Spatial pattern mining and visualization 3 Visualization of colocations 4 Application 5 Conclusion

Frédéric Flouvat A clustering-based visualization of colocations 14 / 36

slide-15
SLIDE 15

Context Spatial pattern mining and visualization Visualization of colocations Application Conclusion

Our approach

Problem

How to visualize interesting colocations on a map ?

Motivations

Have a easily usable and interpretable visual representation for experts Give additional spatial and thematic informations Give a global cartographic view of the solutions

Frédéric Flouvat A clustering-based visualization of colocations 15 / 36

slide-16
SLIDE 16

Context Spatial pattern mining and visualization Visualization of colocations Application Conclusion

A colored and labeled clique representation of colocations

A natural visual representation of a colocation

A clique node = object-type (i.e. feature) vertex = neighborhood relationship Example : Colocation {mining zone, sparse vegetation, sensitive trail, river erosion} Visual representation :

Frédéric Flouvat A clustering-based visualization of colocations 16 / 36

slide-17
SLIDE 17

Context Spatial pattern mining and visualization Visualization of colocations Application Conclusion

A colored and labeled clique representation of colocations

Additional informations

Node coloration to represent thematic informations Edge coloration to visualize the interestingness measure, i.e. the prevalence of the colocation Example : Colocation {mining zone, sparse vegetation, sensitive trail, river erosion} with participation index = 0.8 Visual representation :

Frédéric Flouvat A clustering-based visualization of colocations 17 / 36

slide-18
SLIDE 18

Context Spatial pattern mining and visualization Visualization of colocations Application Conclusion

Spatial representation of colocations

How to position the visual representations of colocations on the map ? In

  • ther words, how to position the clique nodes ?

➫ Using a "spatialization" function Summarize spatial informations on its colocation instances

  • only spatial objects (instances) have spatial informations

Allow to visualize where and how instances of an interesting colocation are generally located

Frédéric Flouvat A clustering-based visualization of colocations 18 / 36

slide-19
SLIDE 19

Context Spatial pattern mining and visualization Visualization of colocations Application Conclusion

A first basic spatialization function

A centroid based spatialization function The centroid = a basic approach to summarize a set of points

  • "average" of all points

➫ For each clique node, generate the centroid of its feature instances

  • ex. : for colocation {A, B, C}, node A is the centroid of spatial
  • bjects {A1, A5} (i.e. objects with A belonging to the table

instance of {A, B, C})

Frédéric Flouvat A clustering-based visualization of colocations 19 / 36

slide-20
SLIDE 20

Context Spatial pattern mining and visualization Visualization of colocations Application Conclusion

A first basic spatialization function

Problem with this centroid based spatialization function ➫ Solution : using clustering to allow several representations for each colocation

Frédéric Flouvat A clustering-based visualization of colocations 20 / 36

slide-21
SLIDE 21

Context Spatial pattern mining and visualization Visualization of colocations Application Conclusion

A clustering-based spatial representation of colocations

Principle

For each interesting colocation, Cluster its instances Process the position of each colocation feature in each cluster, using the centroid based spatialization function Draw the colored and labeled clique representation

Frédéric Flouvat A clustering-based visualization of colocations 21 / 36

slide-22
SLIDE 22

Context Spatial pattern mining and visualization Visualization of colocations Application Conclusion

A clustering-based spatial representation of colocations

Interest of this approach : a better representation of colocation instances

Show where an interesting colocation is generally located

  • Ex. : colocation {A, B, C} is generally located the

north-west (and mainly in this area) Show how features in a colocation are w.r.t. each others

  • Ex. : objects in colocation {A, B, C} are relatively far

from each other

  • show for example that mines and sparse

vegetation have an indirect impact on erosion (colocation {mine, sparse vegetation, erosion}) ➫ Difficult to have such informations with "classical" approaches

One major problem : scalability

All clusterings (one for each colocation) may be computationally expensive

Frédéric Flouvat A clustering-based visualization of colocations 22 / 36

slide-23
SLIDE 23

Context Spatial pattern mining and visualization Visualization of colocations Application Conclusion

Improving performance

Optimizing memory occupation by combining colocation mining algorithm and visualization Clustering and visualization not done in a post-processing step

  • Avoid storage of all colocation instances in memory

Each colocation is mined one by one and their visualization is done at the same time

  • Integrate visualization in the mining algorithm

Optimizing execution time using a heuristic clustering method

Frédéric Flouvat A clustering-based visualization of colocations 23 / 36

slide-24
SLIDE 24

Context Spatial pattern mining and visualization Visualization of colocations Application Conclusion

Heuristic clustering approach

Observation : colocation instances share lots of spatial objects e.g. colocations {A, B} and {A, B, C} share spatial objects A1 and A5 ➫ If clustering in post-processing step, some processing will be done several times e.g. computing distances between A1 and A5

Proposition

A two-step clustering approach integrated in the mining algorithm a clustering of each feature instances, run once at the beginning of the algorithm

  • i.e. one clustering for A objects, one clustering for B objects, ...

a clustering of each colocation instances based on the previous clusters, using a merge and split approach

Frédéric Flouvat A clustering-based visualization of colocations 24 / 36

slide-25
SLIDE 25

Context Spatial pattern mining and visualization Visualization of colocations Application Conclusion

Heuristic clustering approach

Frédéric Flouvat A clustering-based visualization of colocations 25 / 36

slide-26
SLIDE 26

Context Spatial pattern mining and visualization Visualization of colocations Application Conclusion

Focus on the Merge and Split approach

Principle : Select the feature f in the current colocation C, having the highest number of clusters Split instances of C w.r.t. clusters of f Problem : "conflictual clusters", i.e. object instances belonging to several partitions

  • Ex. : Y2 is in the first instance partition and in

the second one Solution : merge clusters leading to a conflict

  • Ex. : merge first and second clusters of Z

➫ Merge and split approach : alternate merge and split until no change

Frédéric Flouvat A clustering-based visualization of colocations 26 / 36

slide-27
SLIDE 27

Context Spatial pattern mining and visualization Visualization of colocations Application Conclusion

Outline

1 Context 2 Spatial pattern mining and visualization 3 Visualization of colocations 4 Application 5 Conclusion

Frédéric Flouvat A clustering-based visualization of colocations 27 / 36

slide-28
SLIDE 28

Context Spatial pattern mining and visualization Visualization of colocations Application Conclusion

Experimentations

Data

Studied area : mountainous watershed of 9km2 3 thematic layers :

  • erosion : "not bare ground" or different types of "bare ground"

(6 features)

  • nature of the ground : lithology (13 features)
  • vegetation : types of vegetation (13 features)

➫ 32 features and more than 7000 objects

Experimental protocol

Spatial relationships : euclidean distance between areas Several participation index thresholds ➫ Results studied by a geologist expert in soil erosion of the studied area

Frédéric Flouvat A clustering-based visualization of colocations 28 / 36

slide-29
SLIDE 29

Context Spatial pattern mining and visualization Visualization of colocations Application Conclusion

Experimentations : Map readability

Number of patterns displayed to users an important indicator for visualization methods if too much patterns are displayed, then interpretation is difficult

Distance Participation index threshold 0.5 0.3 0.1 200m nb colocations 21 68 266 avg nb instances for a colocation 16 478 11 974 8 365 total nb instances for all colocations 346 046 814 263 2 225 118 nb colocations displayed by 31 112 510

  • ur approach

300m nb colocations 55 163 711 avg nb instances for a colocation 50 803 78 347 87 100 total nb instances for all colocations 2 794 205 12 770 670 61 928 727 nb colocations displayed by 84 258 1349

  • ur approach

➫ No more than twice the number of colocations If too much, possibility to use the zoom functionality of the GIS to filter ➫ Enables to compare our approach with classical visualization approaches "select a pattern and display its instances" approach = average number of instances for a colocation

Frédéric Flouvat A clustering-based visualization of colocations 29 / 36

slide-30
SLIDE 30

Context Spatial pattern mining and visualization Visualization of colocations Application Conclusion

Experimentations : Performance evaluation

Execution time versus our approach and a "basic" post-processing clustering approach post-processing approach = executing a DBScan clustering on each table instance after colocation extraction

1 10 100 1000 10000 100000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Total Time (sec) Minimum participation index 5802 objects and 18 features Spatial clustering-based colocation mining Colocation mining then DBScan clustering 1 10 100 1000 10000 100000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Total Time (sec) Minimum participation index 7642 objects and 32 features Spatial clustering-based colocation mining Colocation mining then DBScan clustering

➫ Our approach more efficient than the basic approach

Frédéric Flouvat A clustering-based visualization of colocations 30 / 36

slide-31
SLIDE 31

Context Spatial pattern mining and visualization Visualization of colocations Application Conclusion

Experimentations : Expert feedback

Example of result provided to our expert by our prototype ➫ Point out known correlations about soil erosion in this area e.g. highlight the environmental damage near the areas where there are humans activities ➫ Interest of our approach for experts Give a global picture on where and how colocations are generally located Quickly identify new patterns, then focus on some of these patterns and study more deeply their instances

Frédéric Flouvat A clustering-based visualization of colocations 31 / 36

slide-32
SLIDE 32

Context Spatial pattern mining and visualization Visualization of colocations Application Conclusion

Outline

1 Context 2 Spatial pattern mining and visualization 3 Visualization of colocations 4 Application 5 Conclusion

Frédéric Flouvat A clustering-based visualization of colocations 32 / 36

slide-33
SLIDE 33

Context Spatial pattern mining and visualization Visualization of colocations Application Conclusion

Conclusion & Perspectives

Conclusion

Proposition of a new clustering based visualization of colocations A colored and labeled clique representation with thematic and prevalence informations A spatialization of colocation using a heuristic clustering method and a centroid based positioning ➫ An easily usable and interpretable global picture of the solutions ➫ Good scalability

Main perspectives

Improving algorithm performance with dedicated data structures, spatial indexes, or new mining strategies Improving our prototype Extending our approach to other patterns, e.g. sequential spatio-temporal patterns

Frédéric Flouvat A clustering-based visualization of colocations 33 / 36

slide-34
SLIDE 34

Context Spatial pattern mining and visualization Visualization of colocations Application Conclusion

Questions ?

Thank you

Frédéric Flouvat A clustering-based visualization of colocations 34 / 36

slide-35
SLIDE 35

Context Spatial pattern mining and visualization Visualization of colocations Application Conclusion

Our approach

Problem

How to visualize interesting colocations on a map ?

Principle of our solution

Generate a clique representation of each colocation and georeference this representation on a map using clustering

Frédéric Flouvat A clustering-based visualization of colocations 35 / 36

slide-36
SLIDE 36

Context Spatial pattern mining and visualization Visualization of colocations Application Conclusion

Formal definition of a visual colocation representation

A colored and labeled clique representation of a colocation C = a colored and labeled clique Gcol

C

= (VC, EC, Ltype, Lpi, Ltheme), where VC is the set of vertices, EC = {(u, v) ∈ VC × VC | u = v} is the set of edges, Ltype : VC → C is a labelling function that assigns an object-type f ∈ C to a vertex v ∈ VC, Lpi :

C EC → Col is a coloring function that assigns a color

k ∈ Col = {1, 2, ..., m} (m ≥ 1) to a colocation edge based on the prevalence measure pi(C) ∈ [0, 1] (saturation factor), and Ltheme : VC → Coltheme is a coloring function that assigns the thematic color k′ ∈ Coltheme = {1, 2, ..., m′} (m′ ≥ 1) of object-type Ltype(v) to a vertex v ∈ VC.

Frédéric Flouvat A clustering-based visualization of colocations 36 / 36