LSWT2018 Link Discovery Presentation Presentation September 2018 - PDF document

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/327417445 LSWT2018 Link Discovery Presentation Presentation · September 2018 CITATIONS READS 0 12 1 author: Mohamed Sherif University of Leipzig 32 PUBLICATIONS 269 CITATIONS SEE PROFILE Some of the authors of this publication are also working on these related projects: HOBBIT: Holistic Benchmarking of Big Linked Data View project GEISER: From sensor data to Internet based geo-services View project All content following this page was uploaded by Mohamed Sherif on 04 September 2018. The user has requested enhancement of the downloaded file.

LSWT 2018 Linked Data Integration at Scale Mohamed Ahmed Sherif and Axel-Cyrille Ngonga Ngomo Paderborn University, Data Science Group, Pohlweg 51, D-33098 Paderborn, Germany { firstname.lastname } @upb.de June 18, 2018 Sherif et al. LSWT 2018, Linked Data Integration at Scale June 18, 2018 1 / 27

Motivation Linked Data Principles Sherif et al. LSWT 2018, Linked Data Integration at Scale June 18, 2018 2 / 27

Motivation Why Link Discovery? 1 Linked Open Data Cloud 130+ billion triples ≈ 0.5 billion links Mostly owl:sameAs 2 Decentralized dataset creation 3 Complex information needs ⇒ Need to consume data across knowledge bases 4 Links are central for Cross-ontology QA Data Integration Reasoning Federated Queries ... Sherif et al. LSWT 2018, Linked Data Integration at Scale June 18, 2018 3 / 27

Motivation Cross-Ontology QA Example Give me the name and description of all drugs that cure their side-effect. 1 Need information from Drugbank (Drug description) Sider (Side-effects) DBpedia (Description) 2 Gathering information via SPARQL query using links Sherif et al. LSWT 2018, Linked Data Integration at Scale June 18, 2018 4 / 27

Motivation Cross-Ontology QA Example Give me the name and description of all drugs that cure their side-effect. SELECT ?drug ?name ?desc WHERE { ?drug a drugbank:Drug . ?drug rdfs:label ?name . ?drug drugbank:cures ?disease . ?drug owl:sameAs ?drug2 . ?drug owl:sameAs ?drug3 . ?drug2 sider:hasSideEffect ?effect . ?effect owl:sameAs ?disease . ?drug3 dbo:hasWikiPage ?desc . } Sherif et al. LSWT 2018, Linked Data Integration at Scale June 18, 2018 5 / 27

Motivation Cross-Ontology QA (Geo-spatial) Example (DEQA) Give me flats near kindergartens in Kobe. SELECT ?flat WHERE { ?flat a deqa:Flat . ?flat deqa:near ?school . ?school a lgdo:School . ?school lgdo:city lgdo:Kobe . } Sherif et al. LSWT 2018, Linked Data Integration at Scale June 18, 2018 6 / 27

The Link Discovery Problem Definition Definition (Link Discovery, informal) Given two sets of resources S and T , find links of type R between S and T Here, declarative link discovery Sherif et al. LSWT 2018, Linked Data Integration at Scale June 18, 2018 7 / 27

The Link Discovery Problem Definition Definition (Link Discovery, informal) Given two sets of resources S and T , find links of type R between S and T Here, declarative link discovery Definition (Declarative Link Discovery, formal, similarities) Given sets S and T of resources and relation R Find M = { ( s , t ) ∈ S × T : R ( s , t ) } Common approach: Find M ′ = { ( s , t ) ∈ S × T : σ ( s , t ) ≥ θ } Sherif et al. LSWT 2018, Linked Data Integration at Scale June 18, 2018 7 / 27

The Link Discovery Problem Definition Definition (Link Discovery, informal) Given two sets of resources S and T , find links of type R between S and T Here, declarative link discovery Definition (Declarative Link Discovery, formal, similarities) Given sets S and T of resources and relation R Find M = { ( s , t ) ∈ S × T : R ( s , t ) } Common approach: Find M ′ = { ( s , t ) ∈ S × T : σ ( s , t ) ≥ θ } Definition (Declarative Link Discovery, formal, distances) Given sets S and T of resources and relation R Find M = { ( s , t ) ∈ S × T : R ( s , t ) } Common approach: Find M ′ = { ( s , t ) ∈ S × T : δ ( s , t ) ≤ τ } Sherif et al. LSWT 2018, Linked Data Integration at Scale June 18, 2018 7 / 27

The Link Discovery Problem Definition Most common: R = owl:sameAs Also known as deduplication Sherif et al. LSWT 2018, Linked Data Integration at Scale June 18, 2018 8 / 27

The Link Discovery Problem Definition Goal: Address all possible relations R Declarative Link Discovery: Similarity/distance commonly derived from property (and property chain) values Sherif et al. LSWT 2018, Linked Data Integration at Scale June 18, 2018 9 / 27

The Link Discovery Problem Definition Goal: Address all possible relations R Declarative Link Discovery: Similarity/distance commonly derived from property (and property chain) values Example: R = :sameModel :s770fm rdfs:label "S770FM"@en :s770fm rdfs:label "S770BEM"@en :s770fm rdf:type :SABER :s770fm rdf:type :SABER :s770fm :model :770 :s770fm :model :770 :s770fm :top :FlamedMaple :s770fm :top :BirdEyeMaple :s770fm :producer :Ibanez :s770fm :producer :Ibanez Sherif et al. LSWT 2018, Linked Data Integration at Scale June 18, 2018 9 / 27

The Link Discovery Problem Why is it difficult? 1 Time complexity (Efficiency) Large number of triples (e.g., LinkedTCGA with 20.4 billion triples ) Quadratic a-priori runtime 69 days for mapping cities from DBpedia to Geonames Solutions usually in-memory (insufficient heap space) Sherif et al. LSWT 2018, Linked Data Integration at Scale June 18, 2018 10 / 27

The Link Discovery Problem Why is it difficult? 1 Time complexity (Efficiency) Large number of triples (e.g., LinkedTCGA with 20.4 billion triples ) Quadratic a-priori runtime 69 days for mapping cities from DBpedia to Geonames Solutions usually in-memory (insufficient heap space) ( euclidean ( x.price , y.price ) , 0 . 90) 2 Accuracy \ ( levenshtein ( x.desc , y.desc ) , 0 . 50) Combination of several attributes required for high ⊔ precision ⊓ ( trigrams ( x.name , y.name ) , 0 . 50) Tedious discovery of most adequate mapping Dataset-dependent similarity functions ( cosine ( x.name , y.name ) , 0 . 52) Sherif et al. LSWT 2018, Linked Data Integration at Scale June 18, 2018 10 / 27

Limes Link Discovery Framework for Metric Spaces 1 Time complexity Limes algorithm HR 3 Aegle Radon . . . 2 Accuracy Raven Eagle Coala Euclid https://github.com/dice-group/limes Wombat . . . Sherif et al. LSWT 2018, Linked Data Integration at Scale June 18, 2018 11 / 27

Radon Rapid Discovery of Topological Relations (AAA17) Large number of datasets http://stats.lod2.eu 150+ billion triples ≈ 0.5 billion links Mostly owl:sameAs Large Geo-spatial datasets LinkedGeoData contains > 20+ billion triples NUTS contains up to 1 , 500 points per resources Only 7 . 1% of the links between resources connect geo-spatial entities (Ngonga Ngomo, 2013) Sherif et al. LSWT 2018, Linked Data Integration at Scale June 18, 2018 12 / 27

Radon Why is linking geo-spatial resources difficult? Link Discovery Given two knowledge bases S and T , find links of type R between S and T Formally find M = { ( s , t ) ∈ S × T : R ( s , t ) } Na¨ ıve computation of M requires quadratic time complexity Geo-spatial resources available on the LOD Described using polygons Large in number Demands the computation of topological relations Na¨ ıve computation of M is impracticable for geo-spatial resources Sherif et al. LSWT 2018, Linked Data Integration at Scale June 18, 2018 13 / 27

Radon Algorithm The Dimensionally Extended nine-Intersection Model (DE-9IM) Standard to describe the topological relations in 2D space. DE-9IM is to based on the intersection matrix: � dim ( I ( g 1 ) ∩ I ( g 2 )) dim ( I ( g 1 ) ∩ B ( g 2 )) dim ( I ( g 1 ) ∩ E ( g 2 )) � DE 9 IM ( a , b ) dim ( B ( g 1 ) ∩ I ( g 2 )) dim ( B ( g 1 ) ∩ B ( g 2 )) dim ( B ( g 1 ) ∩ E ( g 2 )) dim ( E ( g 1 ) ∩ I ( g 2 )) dim ( E ( g 1 ) ∩ B ( g 2 )) dim ( E ( g 1 ) ∩ E ( g 2 )) There must be at least one shared point for a relation to be hold Except for the disjoint relation ⇒ inverse of the intersects relation Accelerating the computation of whether two geometries share at least one point, accelerates the computation of any topological relation Sherif et al. LSWT 2018, Linked Data Integration at Scale June 18, 2018 14 / 27

Radon Algorithm Basic Idea Radon implements improved indexing approach based on Minimum bounding boxes (MBB) 1 Space tiling 2 Sherif et al. LSWT 2018, Linked Data Integration at Scale June 18, 2018 15 / 27

Radon Algorithm I. Swapping Strategy Large geometries that span over a large number of hypercubes ⇒ large spatial index when used as S Estimated Total Hypervolume ( ETH ) of a set of geometries X d 1 � � � � ETH ( X ) = | X | max p ∈ x { κ i ( p ) } − min p ∈ x { κ i ( p ) } | X | i =1 x ∈ X If ETH ( S ) > ETH ( T ), swaps S and T and computes the reverse relation r ′ instead of r Since ETH ( NUTS ) > ETH ( CLC ), then S = CLC and T = NUTS e.g. if r is covered and ETH ( S ) > ETH ( T ), then swaps S and T and computes coveredBy Sherif et al. LSWT 2018, Linked Data Integration at Scale June 18, 2018 16 / 27

Radon Algorithm II. Optimized Sparse Space Tiling Insert all geometries s ∈ S into index I ( s ) Computes MBB ( s ) 1 Maps each s to all hypercubes over MBB ( s ) spans 2 Same procedure for all t ∈ T but only index geometries t that are potentially in hypercubes already contained in I ( S ) Sherif et al. LSWT 2018, Linked Data Integration at Scale June 18, 2018 17 / 27

LSWT2018 Link Discovery Presentation Presentation September 2018 - PDF document

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/327417445 LSWT2018 Link Discovery Presentation Presentation September 2018 CITATIONS READS 0 12 1 author: Mohamed Sherif

UNESCO Discovery Centre reference image of education space UNESCO Discovery Centre Discovery

Corporate Presentation September 2018 About Link REIT About Link REIT Link is Our Portfolio (1)

10 GHz Microwave Link 10 GHz Microwave Link 10 GHz Microwave Link 10 GHz Microwave Link Project

Vertex Standard EVX-Link Training EVX-Link Training What is the EVX-Link EVX-Link is a fast

Changing the Game - The De-Linking Paradigm Old Way Our Way De-Link De-Link Link Link

Teacher Teacher-Student Data Link Teacher Teacher Student Data Link Student Data Link Student

ESCom and Scottish Environment LINK Phoebe Cochrane Scottish Environment LINK May 2014

An introduction to link homology Marco Mackaay CAMGSD and Universidade do Algarve 2 September,

RT-Link: A Time-Synchronized Link Protocol Anthony Rowe, Rahul Mangharam, Raj Rajkumar C

Data-link layer Da Data ta-link link layer er Referred to as layer 2 Physical

Chapter 5: The Data Link Layer Chapter 5 Link Layer and LANs Our goals: understand

Lecture 6: Wireless Link Layer, Lecture 6: Wireless Link Layer, MAC protocols, CSMA MAC

Chapter 5: The Data Link Layer Chapter 5 Link Layer and LANs Our goals: understand

Direct Link Networks Direct Link Networks 10/11/06 UIUC - CS/ECE438, Fall 2006 2 Direct Link

From Search to Discovery in our Future Library From Search to Discovery W e see a spectrum of

Watson Discovery Spring 2020 Discovery pipeline Using NLU, document conversion, and UI tools

CHINESE FIT BUILDING VALUE Operators Overview Workshop 2016 REGIONAL OPERATOR WOKSHOP OUTLINE

The Hollow Crown Season 1 Richard II, Henry IV Part I & II, Henry V Season 2 The War of

Seamus Heaney and Literary Tourism November 2015 BTS team Stewart Walker Ivan Broussine

1 Im joined here on the stage by our non executive directors Rick Christie, John Bongard,

&

SINCLAIR ZX SPECTRUM: 30 years of amusement and learning Josetxu Malanda 16th June 2012 Nonick

Russian flat glass market: Advance increase of demand Lev Shakhnes, Union of Glass Companies,

What if it goes right?! NZs design DNA - What we know now - what we have but dont

LSWT2018 Link Discovery Presentation Presentation September 2018 - PDF document

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/327417445 LSWT2018 Link Discovery Presentation Presentation September 2018 CITATIONS READS 0 12 1 author: Mohamed Sherif

UNESCO Discovery Centre reference image of education space UNESCO Discovery Centre Discovery

Corporate Presentation September 2018 About Link REIT About Link REIT Link is Our Portfolio (1)

10 GHz Microwave Link 10 GHz Microwave Link 10 GHz Microwave Link 10 GHz Microwave Link Project

Vertex Standard EVX-Link Training EVX-Link Training What is the EVX-Link EVX-Link is a fast

Changing the Game - The De-Linking Paradigm Old Way Our Way De-Link De-Link Link Link

Teacher Teacher-Student Data Link Teacher Teacher Student Data Link Student Data Link Student

ESCom and Scottish Environment LINK Phoebe Cochrane Scottish Environment LINK May 2014

An introduction to link homology Marco Mackaay CAMGSD and Universidade do Algarve 2 September,

RT-Link: A Time-Synchronized Link Protocol Anthony Rowe, Rahul Mangharam, Raj Rajkumar C

Data-link layer Da Data ta-link link layer er Referred to as layer 2 Physical

Chapter 5: The Data Link Layer Chapter 5 Link Layer and LANs Our goals: understand

Lecture 6: Wireless Link Layer, Lecture 6: Wireless Link Layer, MAC protocols, CSMA MAC

Chapter 5: The Data Link Layer Chapter 5 Link Layer and LANs Our goals: understand

Direct Link Networks Direct Link Networks 10/11/06 UIUC - CS/ECE438, Fall 2006 2 Direct Link

From Search to Discovery in our Future Library From Search to Discovery W e see a spectrum of

Watson Discovery Spring 2020 Discovery pipeline Using NLU, document conversion, and UI tools

CHINESE FIT BUILDING VALUE Operators Overview Workshop 2016 REGIONAL OPERATOR WOKSHOP OUTLINE

The Hollow Crown Season 1 Richard II, Henry IV Part I &amp; II, Henry V Season 2 The War of

Seamus Heaney and Literary Tourism November 2015 BTS team Stewart Walker Ivan Broussine

1 Im joined here on the stage by our non executive directors Rick Christie, John Bongard,

&amp;

SINCLAIR ZX SPECTRUM: 30 years of amusement and learning Josetxu Malanda 16th June 2012 Nonick

Russian flat glass market: Advance increase of demand Lev Shakhnes, Union of Glass Companies,

What if it goes right?! NZs design DNA - What we know now - what we have but dont

The Hollow Crown Season 1 Richard II, Henry IV Part I & II, Henry V Season 2 The War of

&