An expressive dissimilarity measure for relational clustering using - PowerPoint PPT Presentation

An expressive dissimilarity measure for relational clustering using neighbourhood trees Sebastijan Dumančić , Hendrik Blockeel DTAI, CS Department, KU Leuven ECML PKDD 2017, Journal track

1 – Outline 2/28 1 Overture 2 How do we do it now? 3 An expressive dissimilarity for relational data 4 Experiments and results 5 Summary Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

1 – Identifying groups in data 3/28 Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

1 – Which clustering is correct? 7/28 Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

1 – Which clustering is correct? 8/28 Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

1 – What about relational data? 9/28 Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

1 – (Statistical) relational machine learning 10/28 Machine learning with a powerful knowledge representation language usually based on first-order logic Common representation for: vectors graphs sequences ... ... with a unifying reasoning and learning engine Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

1 – Many faces of relational data 11/28 Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

2 – How do we do it now? 13/28 Hybrid similarities Graph kernels Relational similarities incorporate link structural similarities of comparing logical information into graphs constructs attribute-based similarity measure the similarity of random walks, propagation logical formulas in connected vertices of information common, matching terms Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

2 – How do we do it now? 13/28 Hybrid similarities Graph kernels Relational similarities incorporate link structural similarities of comparing logical information into graphs constructs attribute-based similarity measure the similarity of random walks, propagation logical formulas in connected vertices of information common, matching terms Impose a fixed bias Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

3 – How similar are ProfA and ProfB ? 15/28 Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

3 – Main motivations 16/28 A similarity measure for relational data should: incorporate multiple views of similarity be easily adaptable take attributes and relationships into account insensitive to neighbourhood size be efficient Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

3 – Neighbourhood trees 17/28 Neighbourhood trees summarize the neighbourhood of an instance/example Data Neighbourhood tree Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

3 – Neighbourhood trees 17/28 Neighbourhood trees summarize the neighbourhood of an instance/example Data Neighbourhood tree Similarity of instances = similarity of their neighbourhood trees Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

3 – Comparing neighbourhood trees 18/28 Decompose NTs into semantic parts Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

3 – Comparing neighbourhood trees 18/28 Decompose NTs into semantic parts similarity = linear combination of similarities of individual semantic parts ( w 1 , w 2 , w 3 , s 4 , w 5 ) Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

3 – Comparing semantic parts 19/28 Decompose NT in multisets of: attribute edge labels vertex identities per level and vertex type Multiset of edge labels (level 1): { (Advised,2), (Advised,2), (TaughtBy,2) } Compare two multisets, A and B with χ 2 distance ( f A ( x ) − f B ( x )) 2 χ 2 ( A, B ) = � f A ( x ) + f B ( x ) x ∈ A ∪ B Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

3 – Generality of the approach 20/28 Many of the existing similarities are a special case: hybrid similarities relational similarities ... or they can be defined over neighbourhood trees (graph kernels) with different biases: makes it easier to compare the imposed biases Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

3 – Generality of the approach 20/28 Many of the existing similarities are a special case: hybrid similarities relational similarities ... or they can be defined over neighbourhood trees (graph kernels) with different biases: makes it easier to compare the imposed biases Additionally: effective - linear in the number of unique elements in a multiset Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

4 – Experimental setup 22/28 Datasets: Questions: IMDB Quality of the obtained clustering? UWCSE Are different views really necessary? Mutagenesis Can we learn the bias from data? WebKB Can we learn the bias from labels? TerroristAttacks combined with spectral and hierarchical clustering a wide range of existing similarity measures performance measure: ARI/Accuracy Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

4 – Quality of the obtained clusterings 23/28 Takeaway message: incorporating multiple biases consistently performs well Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

4 – Are different views needed? 24/28 Takeaway message: relational data requires multiple views of similarity in order to find informative clusters Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

4 – Learning the weights from data 25/28 ReCeNT with w i = 0 . 2 vs. AASC + ReCeNT AASC - given multiple similarity matrices, find an optimal combination for clustering barely any benefit Huang, Chuang, Chen: Affinity Aggregation for Spectral Clustering Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

4 – Learning weights from labels 26/28 Similarity measure in combination with a kNN (parameters optimised with CV) Takeaway message: when labels are provided, ReCeNT outperforms the competing similarities Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

5 – Summary 28/28 A similarity measure for relational data that: is versatile (meta-similarity) easily adaptable efficient generalization of many existing structured/relational sims works well across many different tasks Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

5 – Summary 28/28 A similarity measure for relational data that: is versatile (meta-similarity) easily adaptable efficient generalization of many existing structured/relational sims works well across many different tasks Code: https://dtai.cs.kuleuven.be/software/recent S. Dumancic, H. Blockeel: Clustering-Based Unsupervised Relational Representation Learning with an Explicit Distributed Representation , IJCAI ’17 S. Dumancic, H. Blockeel: Demystifying Relational Latent Representations , ILP ’17 Relational clustering over neighbourhood trees – S. Dumančić, H. Blockeel

An expressive dissimilarity measure for relational clustering using - PowerPoint PPT Presentation

An expressive dissimilarity measure for relational clustering using neighbourhood trees Sebastijan Dumani , Hendrik Blockeel DTAI, CS Department, KU Leuven ECML PKDD 2017, Journal track 1 Outline 2/28 1 Overture 2 How do we do it now?

Expressive Linear Algebra in Haskell Henning Thielemann 2019-08-21 Expressive Linear Algebra in

Some Clustering Methods on Some Clustering Methods on Some Clustering Methods on Dissimilarity

Chapter 2: Relational Model Chapter 2: Relational Model Structure of Relational Databases

Chapter 3: Relational Model Structure of Relational Databases Relational Algebra Tuple

Relational Algebra Relational Query Languages Recall: Query = Retrieval Program Language

Relational Algebra 1 / 39 Relational Algebra Relational model specifies stuctures and

Relational Query Languages (2) SQL and QBE Walid G. Aref Query Languages For The Relational

Expressive Writing Level 2, Teacher Presentation Book Expressive Writing Level 2, Teacher

Relational Learning Expressive Background Knowledge can be incorporated easily

Chapter 8 Evaluation of Relational Operators Implementing the Relational Algebra Relational

Relational Calculus More declarative than relational algebra Foundation for query

RELATIONAL ALGEBRA CHAPTER 6 1 CHAPTER 6 OUTLINE Unary Relational Operations: SELECT and

Relational Data Model Hacettepe University Computer Engineering Department Outline 1. Relational

This Lecture The Relational Model Relational data structures Relations and Relational

How Many Dissimilarity/Kernel Self Organizing Map Variants Do We Need? Fabrice Rossi SAMM,

??? Encode dissimilarity between locations as edge weights distance

We need a better perceptual similarity metric Lubomir Bourdev WaveOne, Inc. CVPR Workshop

Graph-based Proximity Measures Nagiza F. Samatova William Hendrix John Jenkins Kanchana

1 Implicit Classification Function Efficient Indexing Although it is not necessary to

Near Neighbor Search in High Dimensional Data (1) Motivation Distance Measures Shingling

Data Mining Techniques: Cluster Analysis Mirek Riedewald Many slides based on presentations by

Introduction CSCE CSCE If no label information is available, can still perform 478/878 478/878

Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and

Notes about correlation (for Asgn 2) Sharon Goldwater Sharon Goldwater Correlation Overview of