An expressive dissimilarity measure for relational clustering using - - PowerPoint PPT Presentation

an expressive dissimilarity measure for relational
SMART_READER_LITE
LIVE PREVIEW

An expressive dissimilarity measure for relational clustering using - - PowerPoint PPT Presentation

An expressive dissimilarity measure for relational clustering using neighbourhood trees Sebastijan Dumani , Hendrik Blockeel DTAI, CS Department, KU Leuven ECML PKDD 2017, Journal track 1 Outline 2/28 1 Overture 2 How do we do it now?


slide-1
SLIDE 1

An expressive dissimilarity measure for relational clustering using neighbourhood trees

Sebastijan Dumančić, Hendrik Blockeel DTAI, CS Department, KU Leuven ECML PKDD 2017, Journal track

slide-2
SLIDE 2

1 – Outline

2/28

1 Overture 2 How do we do it now? 3 An expressive dissimilarity for relational data 4 Experiments and results 5 Summary Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

slide-3
SLIDE 3

1 – Identifying groups in data

3/28

Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

slide-4
SLIDE 4

1 – Identifying groups in data

4/28

Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

slide-5
SLIDE 5

1 – Identifying groups in data

5/28

Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

slide-6
SLIDE 6

1 – Identifying groups in data

6/28

Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

slide-7
SLIDE 7

1 – Which clustering is correct?

7/28

Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

slide-8
SLIDE 8

1 – Which clustering is correct?

8/28

Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

slide-9
SLIDE 9

1 – What about relational data?

9/28

Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

slide-10
SLIDE 10

1 – (Statistical) relational machine learning

10/28 Machine learning with a powerful knowledge representation language usually based on first-order logic Common representation for: vectors graphs sequences ... ... with a unifying reasoning and learning engine

Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

slide-11
SLIDE 11

1 – Many faces of relational data

11/28

Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

slide-12
SLIDE 12

1 – Many faces of relational data

11/28

Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

slide-13
SLIDE 13

2 – Outline

12/28

1 Overture 2 How do we do it now? 3 An expressive dissimilarity for relational data 4 Experiments and results 5 Summary Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

slide-14
SLIDE 14

2 – How do we do it now?

13/28 Hybrid similarities

incorporate link information into attribute-based similarity measure the similarity of connected vertices

Graph kernels

structural similarities of graphs random walks, propagation

  • f information

Relational similarities

comparing logical constructs logical formulas in common, matching terms

Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

slide-15
SLIDE 15

2 – How do we do it now?

13/28 Hybrid similarities

incorporate link information into attribute-based similarity measure the similarity of connected vertices

Graph kernels

structural similarities of graphs random walks, propagation

  • f information

Relational similarities

comparing logical constructs logical formulas in common, matching terms

Impose a fixed bias

Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

slide-16
SLIDE 16

3 – Outline

14/28

1 Overture 2 How do we do it now? 3 An expressive dissimilarity for relational data 4 Experiments and results 5 Summary Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

slide-17
SLIDE 17

3 – How similar are ProfA and ProfB?

15/28

Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

slide-18
SLIDE 18

3 – Main motivations

16/28

A similarity measure for relational data should: incorporate multiple views of similarity be easily adaptable take attributes and relationships into account insensitive to neighbourhood size be efficient

Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

slide-19
SLIDE 19

3 – Neighbourhood trees

17/28 Neighbourhood trees summarize the neighbourhood of an instance/example Data Neighbourhood tree

Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

slide-20
SLIDE 20

3 – Neighbourhood trees

17/28 Neighbourhood trees summarize the neighbourhood of an instance/example Data Neighbourhood tree Similarity of instances = similarity of their neighbourhood trees

Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

slide-21
SLIDE 21

3 – Comparing neighbourhood trees

18/28 Decompose NTs into semantic parts

Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

slide-22
SLIDE 22

3 – Comparing neighbourhood trees

18/28 Decompose NTs into semantic parts similarity = linear combination of similarities of individual semantic parts (w1, w2, w3, s4, w5)

Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

slide-23
SLIDE 23

3 – Comparing semantic parts

19/28

Decompose NT in multisets of: attribute edge labels vertex identities per level and vertex type

Multiset of edge labels (level 1): { (Advised,2), (Advised,2), (TaughtBy,2) } Compare two multisets, A and B with χ2 distance χ2(A, B) =

  • x∈A∪B

(fA(x) − fB(x))2 fA(x) + fB(x)

Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

slide-24
SLIDE 24

3 – Generality of the approach

20/28 Many of the existing similarities are a special case: hybrid similarities relational similarities ... or they can be defined over neighbourhood trees (graph kernels) with different biases: makes it easier to compare the imposed biases

Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

slide-25
SLIDE 25

3 – Generality of the approach

20/28 Many of the existing similarities are a special case: hybrid similarities relational similarities ... or they can be defined over neighbourhood trees (graph kernels) with different biases: makes it easier to compare the imposed biases Additionally: effective - linear in the number of unique elements in a multiset

Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

slide-26
SLIDE 26

4 – Outline

21/28

1 Overture 2 How do we do it now? 3 An expressive dissimilarity for relational data 4 Experiments and results 5 Summary Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

slide-27
SLIDE 27

4 – Experimental setup

22/28 Datasets: IMDB UWCSE Mutagenesis WebKB TerroristAttacks Questions: Quality of the obtained clustering? Are different views really necessary? Can we learn the bias from data? Can we learn the bias from labels? combined with spectral and hierarchical clustering a wide range of existing similarity measures performance measure: ARI/Accuracy

Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

slide-28
SLIDE 28

4 – Quality of the obtained clusterings

23/28

Takeaway message: incorporating multiple biases consistently performs well

Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

slide-29
SLIDE 29

4 – Are different views needed?

24/28

Takeaway message: relational data requires multiple views of similarity in order to find informative clusters

Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

slide-30
SLIDE 30

4 – Learning the weights from data

25/28 ReCeNT with wi = 0.2 vs. AASC + ReCeNT

AASC - given multiple similarity matrices, find an optimal combination for clustering

barely any benefit

Huang, Chuang, Chen: Affinity Aggregation for Spectral Clustering

Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

slide-31
SLIDE 31

4 – Learning weights from labels

26/28 Similarity measure in combination with a kNN (parameters optimised with CV)

Takeaway message: when labels are provided, ReCeNT outperforms the competing similarities

Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

slide-32
SLIDE 32

5 – Outline

27/28

1 Overture 2 How do we do it now? 3 An expressive dissimilarity for relational data 4 Experiments and results 5 Summary Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

slide-33
SLIDE 33

5 – Summary

28/28 A similarity measure for relational data that: is versatile (meta-similarity) easily adaptable efficient generalization of many existing structured/relational sims works well across many different tasks

Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

slide-34
SLIDE 34

5 – Summary

28/28 A similarity measure for relational data that: is versatile (meta-similarity) easily adaptable efficient generalization of many existing structured/relational sims works well across many different tasks Code: https://dtai.cs.kuleuven.be/software/recent

  • S. Dumancic, H. Blockeel: Clustering-Based Unsupervised Relational Representation

Learning with an Explicit Distributed Representation, IJCAI ’17

  • S. Dumancic, H. Blockeel: Demystifying Relational Latent Representations, ILP ’17

Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel