graphs aphs
play

Graphs aphs Workshop on Linked Data on the Web (LDOW 2013) - PowerPoint PPT Presentation

14 Mai 2013 Similar milar Str truc uctures tures ins nside ide RDF- Graphs aphs Workshop on Linked Data on the Web (LDOW 2013) Collocated with the 22nd International World Wide Web Conference (WWW 2013) Anas s Alzogbi bi Georg rg


  1. 14 Mai 2013 Similar milar Str truc uctures tures ins nside ide RDF- Graphs aphs Workshop on Linked Data on the Web (LDOW 2013) Collocated with the 22nd International World Wide Web Conference (WWW 2013) Anas s Alzogbi bi Georg rg Lausen University of Freiburg Databases & Information Systems

  2. 1. Mo Moti tivation vation  RDF datasets are growing constantly (e.g. LOD)  Minimum Constraints for RDF data make it irregular, difficult to comprehend and visualize  Idea ea ◦ Discover RDF subjects which exhibit similar structures ◦ Preserve the meaning by preserving the structure Similar Structures inside RDF-Graphs 2

  3. 2. Our ur Approach proach  Two phases approach ◦ Collapse Equivalent structures (Bisimilarity Equivalence) ◦ Collapse Similar structures (Clustering) reduced RDF Graph RDF Graph Non-Literal Entities Perfect Typing Similarity based reduction Bisimilarity PTG Complete link Equivalence agglomerative clustering Similar Structures inside RDF-Graphs 3

  4. 3. Per erfe fect ct Typing ping Bisimilarity equivalence Let 𝐻 = (𝑊, 𝐹, 𝑀) be an RDF graph, Two nodes 𝑤, 𝑣 ∈ 𝑊 are bisimilar ( 𝑤 ≈ 𝐶 𝑣 ) if they have the same set of outgoing paths: 𝑄 𝑤 = 𝑄 𝑣 𝑤 2 ≈ 𝐶 𝑄 𝑤 6 𝑄 𝑤 2 = 𝑄 𝑤 6 = 𝑗 ⇒ 𝑄 i i c c b b 𝑤 2 𝑤 3 𝑤 5 𝑤 6 d d a a 𝑤 4 𝑄 𝑤 5 = 𝑄 𝑤 3 = { 𝑏 , 𝑐, 𝑗 , 𝑑 , 𝑒, ℎ , 𝑒, 𝑕 , 𝑓 } e 𝑤 3 ≈ 𝐶 𝑄 g e h 𝑤 5 ⇒ 𝑄 Similar Structures inside RDF-Graphs 4

  5. 4. Similari milarity ty Based sed Red eduction uction  Hierarchical clustering ◦ Exclusive, unsupervised ◦ Requires similarity matrix  Instance tree & intersection tree [Lösch et al. 2012] 𝜏 (𝑤) is the instance tree of node 𝑤  𝑈 c 𝑤 1 f i b 𝑤 3 𝑤 1 b c a e f e d b 𝑤 3 d a d c 𝑤 4 𝑤 2 𝑤 4 i a g h g h e 𝑈 𝜏 (𝑤 1 ) PTG Similar Structures inside RDF-Graphs 5

  6. 4. Similari milarity ty Based sed Red eduction uction  Instance tree & intersection tree 𝑤 1 𝑤 2 c b c e c b e f e d b a a d a d 𝑤 3 𝑤 3 𝑤 4 𝑤 4 i i i h g h h g g 𝑗𝑜𝑢𝑓𝑠𝑡𝑓𝑑𝑢 𝑈 𝜏 𝑤 1 , 𝑈 𝜏 𝑤 2 𝑈 𝜏 (𝑤 1 ) 𝑈 𝜏 (𝑤 2 ) 𝑡𝑗𝑨𝑓 𝑈 𝜏 𝑤 1 = 9 𝑡𝑗𝑨𝑓 𝑈 𝜏 𝑤 2 = 8 𝑡𝑗𝑨𝑓 𝑗𝑜𝑢𝑓𝑠𝑡𝑓𝑑𝑢 = 8  Pairwise similarity 𝑡𝑗𝑛 𝑤 1 , 𝑤 2 = 𝑡𝑗𝑨𝑓(𝑗𝑜𝑢𝑓𝑠𝑡𝑓𝑑𝑢 𝑈 𝜏 𝑤 1 , 𝑈 𝜏 𝑤 2 ) = 8 8,5 = 0,94 (𝑡𝑗𝑨𝑓 𝑈 𝜏 𝑤 1 + 𝑡𝑗𝑨𝑓 𝑈 𝜏 𝑤 2 ) 2 Similar Structures inside RDF-Graphs 6

  7. 4. Similari milarity ty based sed red eduction uction  agglomerative algorithm for complete-link clustering x 1 x 4 x 5 x 2 x 3 G(∞)={{x 1 },{ x 2 },{x 3 }, {x 4 }, {x 5 }} G(0.9)={{x 1 }, {x 2 , x 3 }, {x 4 },{x 5 }} G(0.8) = {{x 1 , x 4 },{x 2 , x 3 }, {x 5 }} G(0.3) = {{x 1 , x 4 , x 5 },{x 2 , x 3 }} x 1 x 4 G(0) = {{x 1 , x 4 , x 5 ,x 2 , x 3 }} Dendrogram x 5 x 2 x 3 Threshold graph Similar Structures inside RDF-Graphs 7

  8. 4. Similari milarity ty based sed red eduction uction  List of partitions G(∞)={{x 1 },{x 2 },{x 3 }, {x 4 }, {x 5 }} G(0.9)={{x 1 }, {x 2 , x 3 }, {x 4 },{x 5 }} G(0.8) = {{x 1 , x 4 },{x 2 , x 3 }, {x 5 }} G(0.3) = {{x 1 , x 4 , x 5 },{x 2 , x 3 }} G(0) = {{x 1 , x 4 , x 5 , x 2 , x 3 }}  Which partition is appropriate? 1 |𝒬 𝜐 | 𝐽𝑜𝑢𝑠𝑏𝑇𝑗𝑛 𝒬𝜐 = 𝐽𝑜𝑢𝑠𝑏𝑇𝑗𝑛 𝑑 𝑑∈𝒬 𝜐 1 𝑜 , where: 𝜇 𝐽𝑜𝑢𝑠𝑏𝑇𝑗𝑛 𝑑 = 𝑇[𝑑 𝑗 , 𝑑 𝑘 ] 𝑗<𝑘 𝑜(𝑜−1) , 𝑜 : the number of elements in 𝑑 𝜇 = 2 Similar Structures inside RDF-Graphs 8

  9. 5. E Eva valuati uation on Data set Subjects Objects Predicates Edges SP 2 Bench250K 50K 100K 61 250K LUBM2 40K 20K 32 240K BSBM500K 48K 100K 40 500K SwDogFood 25K 55K 170 290K Similar Structures inside RDF-Graphs 9

  10. 5. Eval valuation uation  Experimental Results 1. IntraSim & Similarity value Similar Structures inside RDF-Graphs 10

  11. 5. Eval valuation uation  Experimental Results 1. IntraSim & Partition size Similar Structures inside RDF-Graphs 11

  12. 5. Eval valuation uation  Experimental Results Data set Subjects RDF types Clusters errors SP 2 Bench250K 50K 9 85 0 LUBM2 40K 14 6 2 BSBM500K 48K 9 7 0 SwDogFood 25K 43 1918 22 ◦ LUBM2 2 universities appeared with 3728 courses ◦ SwDogFood 21 ResearchTopics appeared with 36 SpatialThings Similar Structures inside RDF-Graphs 12

  13. 5. Eval valuation uation  Experimental Results ◦ SwDogFood  22K typed subjects  43 different types 𝑂𝑣𝑛𝑐𝑓𝑠 𝑝𝑔 𝐷𝑚𝑣𝑡𝑢𝑓𝑠𝑡 . 10 4 𝒬 35% 𝒬 30% 𝒬 23% 𝒬 64% 𝒬 50% 𝒬 𝒬 45% 40% Partition #Clusters 1918 424 287 196 119 70 25 #Clusters 1795 413 280 191 116 68 25 with Types Multi Types 83 58 51 46 33 23 17 Clusters #Errors 22 133 209 209 209 210 251 Error Ratio 0, 09% 0, 94% 0, 94% 0, 94% 0,95% 1,26% 0, 6% Similar Structures inside RDF-Graphs 13

  14. 6. Con onclusion clusion & Fut Futur ure e Wor ork  Concl clusion usion ◦ Two phase approach ◦ Discover equivalent, then similar structures ◦ Use Bisimilarity equivalence + Agglomerative clustering ◦ Apply 𝐽𝑜𝑢𝑠𝑏𝑇𝑗𝑛 as a metric to choose the best partition  Future ure Work ◦ Edge filtering Consider only important edges ◦ Experiment on bigger data sets [http://www.superscholar.org] Similar Structures inside RDF-Graphs 14

  15. Tha hank nk you ou fo for you our att ttent ntion on! Similar Structures inside RDF-Graphs 15

  16. Ref efer eren ences ces  [Lösch et al. 2012] U. Lösch, S. Bloehdorn, and A. Rettinger, Graph Kernels for RDF Data , in ESWC, 2012 Similar Structures inside RDF-Graphs 16

  17. SP 2 Bench250K Similar Structures inside RDF-Graphs 17

  18. BSBM500K Similar Structures inside RDF-Graphs 18

  19. LUBM2 Similar Structures inside RDF-Graphs 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend