Graphs aphs Workshop on Linked Data on the Web (LDOW 2013) - - PowerPoint PPT Presentation

graphs aphs
SMART_READER_LITE
LIVE PREVIEW

Graphs aphs Workshop on Linked Data on the Web (LDOW 2013) - - PowerPoint PPT Presentation

14 Mai 2013 Similar milar Str truc uctures tures ins nside ide RDF- Graphs aphs Workshop on Linked Data on the Web (LDOW 2013) Collocated with the 22nd International World Wide Web Conference (WWW 2013) Anas s Alzogbi bi Georg rg


slide-1
SLIDE 1

Similar milar Str truc uctures tures ins nside ide RDF- Graphs aphs

Workshop on Linked Data on the Web (LDOW 2013) Collocated with the 22nd International World Wide Web Conference (WWW 2013)

Anas s Alzogbi bi Georg rg Lausen University of Freiburg Databases & Information Systems

14 Mai 2013

slide-2
SLIDE 2

 RDF datasets are growing constantly (e.g. LOD)  Minimum Constraints for RDF data make it

irregular, difficult to comprehend and visualize

 Idea

ea

  • Discover RDF subjects which exhibit similar structures
  • Preserve the meaning by preserving the structure

Similar Structures inside RDF-Graphs 2

  • 1. Mo

Moti tivation vation

slide-3
SLIDE 3

 Two phases approach

  • Collapse Equivalent structures (Bisimilarity Equivalence)
  • Collapse Similar structures (Clustering)

3

  • 2. Our

ur Approach proach

RDF Graph

Non-Literal Entities

Perfect Typing

Bisimilarity Equivalence

Similarity based reduction

Complete link agglomerative clustering

PTG

reduced RDF Graph

Similar Structures inside RDF-Graphs

slide-4
SLIDE 4

4

  • 3. Per

erfe fect ct Typing ping

Bisimilarity equivalence

Let 𝐻 = (𝑊, 𝐹, 𝑀) be an RDF graph, Two nodes 𝑤, 𝑣 ∈ 𝑊 are bisimilar (𝑤 ≈𝐶 𝑣) if they have the same set of outgoing paths: 𝑄

𝑤 = 𝑄 𝑣

𝑤4 𝑤5 𝑤3 𝑤6 a b b a c c d d e e g h i

𝑄

𝑤2 = 𝑄 𝑤6 = 𝑗 ⇒ 𝑄 𝑤2 ≈𝐶 𝑄 𝑤6

𝑄

𝑤5 = 𝑄 𝑤3 = { 𝑏 , 𝑐, 𝑗 , 𝑑 , 𝑒, ℎ , 𝑒, 𝑕 , 𝑓 }

⇒ 𝑄

𝑤3 ≈𝐶 𝑄 𝑤5

Similar Structures inside RDF-Graphs

𝑤2 i

slide-5
SLIDE 5

 Hierarchical clustering

  • Exclusive, unsupervised
  • Requires similarity matrix

 Instance tree & intersection tree [Lösch et al. 2012]

 𝑈

𝜏(𝑤) is the instance tree of node 𝑤

5

  • 4. Similari

milarity ty Based sed Red eduction uction

𝑤1 𝑤3 𝑤4 𝑤2 a a b b c c d d e e f g h i PTG 𝑤1 𝑤3 𝑤4 a b c d e f g h i 𝑈

𝜏(𝑤1)

Similar Structures inside RDF-Graphs

slide-6
SLIDE 6

6

  • 4. Similari

milarity ty Based sed Red eduction uction

𝑤1 𝑤3 𝑤4 a b c d e f g h i 𝑈

𝜏(𝑤1)

𝑤3 𝑤4 𝑤2 a b c d e g h i 𝑈

𝜏(𝑤2)

a b c d e g h i 𝑗𝑜𝑢𝑓𝑠𝑡𝑓𝑑𝑢 𝑈

𝜏 𝑤1 , 𝑈 𝜏 𝑤2

𝑡𝑗𝑨𝑓 𝑈

𝜏 𝑤1

= 9 𝑡𝑗𝑨𝑓 𝑈

𝜏 𝑤2

= 8 𝑡𝑗𝑨𝑓 𝑗𝑜𝑢𝑓𝑠𝑡𝑓𝑑𝑢 = 8 𝑡𝑗𝑛 𝑤1, 𝑤2 = 𝑡𝑗𝑨𝑓(𝑗𝑜𝑢𝑓𝑠𝑡𝑓𝑑𝑢 𝑈

𝜏 𝑤1 , 𝑈 𝜏 𝑤2

) (𝑡𝑗𝑨𝑓 𝑈

𝜏 𝑤1

+ 𝑡𝑗𝑨𝑓 𝑈

𝜏 𝑤2

) 2 = 8 8,5 = 0,94

Similar Structures inside RDF-Graphs

 Instance tree & intersection tree  Pairwise similarity

slide-7
SLIDE 7

Similar Structures inside RDF-Graphs 7

  • 4. Similari

milarity ty based sed red eduction uction

 agglomerative algorithm for complete-link

clustering

x1 x4 x5 x2 x3 x1 x4 x5 x2 x3

G(0.9)={{x1}, {x2, x3}, {x4},{x5}} G(0.8) = {{x1, x4},{x2, x3}, {x5}} G(0.3) = {{x1, x4, x5},{x2, x3}} G(0) = {{x1, x4, x5,x2, x3}} G(∞)={{x1},{ x2},{x3}, {x4}, {x5}} Dendrogram Threshold graph

slide-8
SLIDE 8

Similar Structures inside RDF-Graphs 8

  • 4. Similari

milarity ty based sed red eduction uction

 List of partitions  Which partition is appropriate?

𝐽𝑜𝑢𝑠𝑏𝑇𝑗𝑛𝒬𝜐 =

1 |𝒬𝜐|

𝐽𝑜𝑢𝑠𝑏𝑇𝑗𝑛𝑑

𝑑∈𝒬𝜐

𝐽𝑜𝑢𝑠𝑏𝑇𝑗𝑛𝑑 =

1 𝜇

𝑇[𝑑𝑗, 𝑑

𝑘] 𝑜 𝑗<𝑘

, where: 𝜇 =

𝑜(𝑜−1) 2

, 𝑜: the number of elements in 𝑑 G(0.9)={{x1}, {x2, x3}, {x4},{x5}} G(0.8) = {{x1, x4},{x2, x3}, {x5}} G(0.3) = {{x1, x4, x5},{x2, x3}} G(0) = {{x1, x4, x5, x2, x3}} G(∞)={{x1},{x2},{x3}, {x4}, {x5}}

slide-9
SLIDE 9
  • 5. E

Eva valuati uation

  • n

9 Similar Structures inside RDF-Graphs

Data set Subjects Objects Predicates Edges SP2Bench250K 50K 100K 61 250K LUBM2 40K 20K 32 240K BSBM500K 48K 100K 40 500K SwDogFood 25K 55K 170 290K

slide-10
SLIDE 10

 Experimental Results

1. IntraSim & Similarity value

Similar Structures inside RDF-Graphs 10

  • 5. Eval

valuation uation

slide-11
SLIDE 11

 Experimental Results

1. IntraSim & Partition size

Similar Structures inside RDF-Graphs 11

  • 5. Eval

valuation uation

slide-12
SLIDE 12

 Experimental Results

  • LUBM2

2 universities appeared with 3728 courses

  • SwDogFood

21 ResearchTopics appeared with 36 SpatialThings

Similar Structures inside RDF-Graphs 12

  • 5. Eval

valuation uation

Data set Subjects RDF types Clusters errors SP2Bench250K 50K 9 85 LUBM2 40K 14 6 2 BSBM500K 48K 9 7 SwDogFood 25K 43 1918 22

slide-13
SLIDE 13

Similar Structures inside RDF-Graphs 13

 Experimental Results

  • SwDogFood

 22K typed subjects  43 different types

  • 5. Eval

valuation uation

𝑂𝑣𝑛𝑐𝑓𝑠 𝑝𝑔 𝐷𝑚𝑣𝑡𝑢𝑓𝑠𝑡 . 104

Partition

𝒬64%

#Clusters

1918

#Clusters with Types

1795

Multi Types Clusters

83

#Errors

22

Error Ratio

0, 09%

𝒬50%

424 413 58 133 0, 6%

𝒬

45% 287 280 51 209 0, 94%

𝒬

40% 196 191 46 209 0, 94%

𝒬35%

119 116 33 209 0, 94%

𝒬23%

25 25 17 251 1,26%

𝒬30%

70 68 23 210 0,95%

slide-14
SLIDE 14

 Concl

clusion usion

  • Two phase approach
  • Discover equivalent, then similar structures
  • Use Bisimilarity equivalence + Agglomerative clustering
  • Apply 𝐽𝑜𝑢𝑠𝑏𝑇𝑗𝑛 as a metric to choose the best partition

 Future

ure Work

  • Edge filtering

Consider only important edges

  • Experiment on bigger data sets

14

  • 6. Con
  • nclusion

clusion & Fut Futur ure e Wor

  • rk

[http://www.superscholar.org]

Similar Structures inside RDF-Graphs

slide-15
SLIDE 15

15

Tha hank nk you

  • u fo

for you

  • ur att

ttent ntion

  • n!

Similar Structures inside RDF-Graphs

slide-16
SLIDE 16

 [Lösch et al. 2012]

  • U. Lösch, S. Bloehdorn, and A. Rettinger, Graph Kernels for RDF Data, in ESWC, 2012

Similar Structures inside RDF-Graphs 16

Ref efer eren ences ces

slide-17
SLIDE 17

Similar Structures inside RDF-Graphs 17

SP2Bench250K

slide-18
SLIDE 18

Similar Structures inside RDF-Graphs 18

BSBM500K

slide-19
SLIDE 19

Similar Structures inside RDF-Graphs 19

LUBM2