Interlinking: Performance Assessment of User Evaluation vs. - PowerPoint PPT Presentation

Interlinking: Performance Assessment of User Evaluation vs. Supervised Learning Approaches Mofeed Hassan, Jens Lehmann and Axel-Cyrille Ngonga Ngomo Agile Knowledge Engineering and Semantic Web Department of Computer Science University of Leipzig Augustusplatz 10, 04109 Leipzig { mounir,lehmann,ngonga } @informatik.uni-leipzig.de WWW home page: http://limes.sf.net May 17, 2015

LDOW-2015 tugraz Why Link Discovery? 1 Fourth Linked Data principle 2 Links are central for Cross-ontology QA Data Integration Reasoning Federated Queries ... 3 Linked Data on the Web: 10+ thousand datasets 89+ billion triples ≈ 500+ million links M. Hassan, J. Lehmann and A. Ngonga May 17, 2015 Interlinking: Humans vs. Machines 2 / 25

LDOW-2015 tugraz Why is it difficult? Definition (Link Discovery) Given sets S and T of resources and relation R Task: Find M = { ( s , t ) ∈ S × T : R ( s , t ) } Common approaches: Find M ′ = { ( s , t ) ∈ S × T : σ ( s , t ) ≥ θ } Find M ′ = { ( s , t ) ∈ S × T : δ ( s , t ) ≤ θ } 1 Time complexity Large number of triples Quadratic a-priori runtime 69 days for mapping cities from DBpedia to Geonames (1ms per comparison) Decades for linking DBpedia and LGD . . . M. Hassan, J. Lehmann and A. Ngonga May 17, 2015 Interlinking: Humans vs. Machines 3 / 25

LDOW-2015 tugraz Why is it difficult? 2 Complexity of specifications Combination of several attributes required for high precision Adequate atomic similarity functions difficult to detect Tedious discovery of most adequate mapping M. Hassan, J. Lehmann and A. Ngonga May 17, 2015 Interlinking: Humans vs. Machines 4 / 25

LDOW-2015 tugraz Introduction Interlinking tools LIMES, SILK, RDFAI,... Interlinking tools differ in many factors such as: 1 Automation and user involvement 2 Domain dependency 3 Matching techniques Manual links validation as a user involvement: 1 Benchmarks 2 Active learning positive and negative examples M. Hassan, J. Lehmann and A. Ngonga May 17, 2015 Interlinking: Humans vs. Machines 5 / 25

LDOW-2015 tugraz Introduction Commonly used String distance/similarity measures Edit distance Q-Gram similarity Jaro-Winkler . . . Metrics Minkowski distance Orthodromic distance Symmetric Hausdorff distance . . . Idea Learning distance/similarity measures from data can lead to better accuracy while linking. M. Hassan, J. Lehmann and A. Ngonga May 17, 2015 Interlinking: Humans vs. Machines 6 / 25

LDOW-2015 tugraz Motivation/1 Problem Edit distance does not differentiate between different types of edits. Source labels Target labels Generalised epidermolysis Generalized epidermolysis Diabetes I Diabetes I Diabetes II Diabetes II M. Hassan, J. Lehmann and A. Ngonga May 17, 2015 Interlinking: Humans vs. Machines 7 / 25

LDOW-2015 tugraz Motivation/2 Choosing θ ∈ [0 , 1) % F-Score 80.0 Precision 100.0 Recall 66.7 Choosing θ ∈ [1 , 2) % F-Score 75.0 Precision 60.0 Recall 100.0 Solution: Weighted edit distance Assign weight to each operation: substitution, insertion, deletion. M. Hassan, J. Lehmann and A. Ngonga May 17, 2015 Interlinking: Humans vs. Machines 8 / 25

LDOW-2015 tugraz Motivation/3 Cost matrix Costs are arranged in a quadratic matrix M Cell m i , j contains the cost of transforming character associated to row i into character associated with column j Characters are from an alphabet { ‘ A ‘ , . . . , ‘ Z ‘ , ‘ a ‘ , . . . , ‘ z ‘ , ‘0‘ , . . . , ‘9‘ , ‘ ǫ ‘ } Main diagonal values are zeros M. Hassan, J. Lehmann and A. Ngonga May 17, 2015 Interlinking: Humans vs. Machines 9 / 25

LDOW-2015 tugraz Motivation/4 Pros Can differentiate between edit operations. Better F-measure in some cases. Cons No dedicated scalable algorithm for weighted edit distances Difficult to use for link discovery. M. Hassan, J. Lehmann and A. Ngonga May 17, 2015 Interlinking: Humans vs. Machines 10 / 25

LDOW-2015 tugraz Motivation/5 DBLP–Scholar ABT–Buy DBLP–ACM F-measure (%) 87.85 0.60 97.92 Without REEDED (s) 30,096 43,236 26,316 With REEDED (s) 668.62 65.21 14.24 M. Hassan, J. Lehmann and A. Ngonga May 17, 2015 Interlinking: Humans vs. Machines 11 / 25

LDOW-2015 tugraz Extension of existing algorithms Idea edit ( x , y ) = θ → Need θ operations to transform x into y δ ( x , y ) ≥ θ · min i � = j m ij Extension θ 1 Run existing algorithm with threshold min i � = j m ij 2 Filter results by using δ ( x , y ) ≥ θ Problem Does not scale. M. Hassan, J. Lehmann and A. Ngonga May 17, 2015 Interlinking: Humans vs. Machines 12 / 25

LDOW-2015 tugraz REEDED Series of filters. Both complete and correct . M. Hassan, J. Lehmann and A. Ngonga May 17, 2015 Interlinking: Humans vs. Machines 13 / 25

LDOW-2015 tugraz Length-Aware Filter Input : a pair ( s , t ) ∈ S × T and a threshold θ Output : the pair itself or null Insight Given two strings s and t with lengths | s | resp. | t | , we need at least || s | − | t || edit operations to transform s into t . Examples A. � s , t , θ � = � “ realize “ , “ realise “ , 1 � || s | − | t || = 0 , ⇒ pass B. � s , t , θ � = � “ realize “ , “ real “ , 1 � || s | − | t || = 3 , ⇒ discard M. Hassan, J. Lehmann and A. Ngonga May 17, 2015 Interlinking: Humans vs. Machines 14 / 25

LDOW-2015 tugraz Character-Aware Filter Input : a pair ( s , t ) ∈ L and a threshold θ Output : the pair itself or null Insight Given two strings s and t , if | C | is the number of characters that do not belong to both strings, we need at least | C | 2 operations to transform s into t . Examples A. � s , t , θ � = � “ realize “ , “ realise “ , 1 � ⌊ | C | C = { s , z } , 2 ⌋ · min i � = j ( m ij ) = 0 . 5 , ⇒ pass B. � s , t , θ � = � “ realize “ , “ concept “ , 1 � C = { r , c , a , l , i , z , o , n , p , t } , ⌊ | C | 2 ⌋ · min i � = j ( m ij ) > 1 , ⇒ discard M. Hassan, J. Lehmann and A. Ngonga May 17, 2015 Interlinking: Humans vs. Machines 15 / 25

LDOW-2015 tugraz Verification Filter Input : a pair ( s , t ) ∈ C and a threshold θ Output : the pair itself or null Insight Definition of Weighted Edit Distance. Two strings s and t are similar iff the sum of the operation costs to transform s into t is less than or equal to θ . Examples A. � s , t , θ � = � “ realize “ , “ realise “ , 1 � δ ( s , t ) = m z , s = 0 . 6 , ⇒ pass M. Hassan, J. Lehmann and A. Ngonga May 17, 2015 Interlinking: Humans vs. Machines 16 / 25

LDOW-2015 tugraz Experimental Setup/1 Datasets dataset.property domain # of pairs avg length DBLP.title bibliographic 6,843,456 56.359 ACM.authors bibliographic 5,262,436 46.619 GoogleProducts.name e-commerce 10,407,076 57.024 ABT.description e-commerce 1,168,561 248.183 M. Hassan, J. Lehmann and A. Ngonga May 17, 2015 Interlinking: Humans vs. Machines 17 / 25

Interlinking: Performance Assessment of User Evaluation vs. - PowerPoint PPT Presentation

Interlinking: Performance Assessment of User Evaluation vs. Supervised Learning Approaches Mofeed Hassan, Jens Lehmann and Axel-Cyrille Ngonga Ngomo Agile Knowledge Engineering and Semantic Web Department of Computer Science University of

Interlinking Insurance and Product Markets Experimental Evidence from Contract Farming in Kenya

Preliminary Analysis of Data Sources Interlinking Andrea Mannocci and Paolo Manghi ISTI-CNR

User Interface Evaluation Empirical evaluation Heuristic evaluation 1 CS 349 - UI evaluation

RUN groupadd -r user && useradd -r -g user user USER user $ docker run --read-only debian

Chapter 12. Evaluation Research Chapter 12. Evaluation Research evaluation research? evaluation

What is a performance evaluation? Performance Management v. Performance Evaluation Evaluation

4. Performance Analysis of Parallel Programs 4.1 Performance Evaluation of Computer User

Telematics 2 & Performance Evaluation Chapter 4 Introduction to Performance Evaluation

The interlocking puzzle of input use in agriculture: n Rain-fed agriculture exposes farmers to

Interlinking source text collections a Norwegian example Christian-Emil Ore Charter by king

NAISC-L: An Authoritative Linked Data Interlinking Approach for the Library Domain November 26th

Supporting Data Interlinking in Semantic Libraries with Microtask Crowdsourcing Cristina Sarasua

Interlinking Distributed Social Graphs Matthew Rowe OAK Group Department of Computer Science

Automatic Interlinking of music datasets on the Semantic Web Yves Raimond, Christopher Sutton,

KnowledgeStore Scalable Framework for Interlinking Text and Knowledge Marco Rospocher,

KnowledgeStore Scalable Framework for Interlinking Text and Knowledge Marco Rospocher

ECE 5984: Introduction to Machine Learning Topics: Supervised Learning Measuring

Nearest Neighbor Classification Seed classification by area and What should we compactness

Hilberts problems and contemporary mathematical logic Jan Kraj cek MFF UK (KA)

Geometry of numbers: old and new problems Jacques Martinet Universit e de Bordeaux, IMB

Nova Fandina Hebrew University, Israel fandina@cs.huji.ac.il Joint work with Yair Bartal ,

Web Data Representation Web Graph, Text, Images, Metadata, Search spaces Web Search 1 The Web

A Minkowski problem for nonlinear capacity Andrew Vogel April 22, Boston AMS special session

Twistors and Amplitudes Andrew Hodges (Oxford) MHV@30 Fermi National Accelerator Laboratory,

Interlinking: Performance Assessment of User Evaluation vs. - PowerPoint PPT Presentation

Interlinking: Performance Assessment of User Evaluation vs. Supervised Learning Approaches Mofeed Hassan, Jens Lehmann and Axel-Cyrille Ngonga Ngomo Agile Knowledge Engineering and Semantic Web Department of Computer Science University of

Interlinking Insurance and Product Markets Experimental Evidence from Contract Farming in Kenya

Preliminary Analysis of Data Sources Interlinking Andrea Mannocci and Paolo Manghi ISTI-CNR

User Interface Evaluation Empirical evaluation Heuristic evaluation 1 CS 349 - UI evaluation

RUN groupadd -r user &amp;&amp; useradd -r -g user user USER user $ docker run --read-only debian

Chapter 12. Evaluation Research Chapter 12. Evaluation Research evaluation research? evaluation

What is a performance evaluation? Performance Management v. Performance Evaluation Evaluation

4. Performance Analysis of Parallel Programs 4.1 Performance Evaluation of Computer User

Telematics 2 &amp; Performance Evaluation Chapter 4 Introduction to Performance Evaluation

The interlocking puzzle of input use in agriculture: n Rain-fed agriculture exposes farmers to

Interlinking source text collections a Norwegian example Christian-Emil Ore Charter by king

NAISC-L: An Authoritative Linked Data Interlinking Approach for the Library Domain November 26th

Supporting Data Interlinking in Semantic Libraries with Microtask Crowdsourcing Cristina Sarasua

Interlinking Distributed Social Graphs Matthew Rowe OAK Group Department of Computer Science

Automatic Interlinking of music datasets on the Semantic Web Yves Raimond, Christopher Sutton,

KnowledgeStore Scalable Framework for Interlinking Text and Knowledge Marco Rospocher,

KnowledgeStore Scalable Framework for Interlinking Text and Knowledge Marco Rospocher

ECE 5984: Introduction to Machine Learning Topics: Supervised Learning Measuring

Nearest Neighbor Classification Seed classification by area and What should we compactness

Hilberts problems and contemporary mathematical logic Jan Kraj cek MFF UK (KA)

Geometry of numbers: old and new problems Jacques Martinet Universit e de Bordeaux, IMB

Nova Fandina Hebrew University, Israel fandina@cs.huji.ac.il Joint work with Yair Bartal ,

Web Data Representation Web Graph, Text, Images, Metadata, Search spaces Web Search 1 The Web

A Minkowski problem for nonlinear capacity Andrew Vogel April 22, Boston AMS special session

Twistors and Amplitudes Andrew Hodges (Oxford) MHV@30 Fermi National Accelerator Laboratory,

RUN groupadd -r user && useradd -r -g user user USER user $ docker run --read-only debian

Telematics 2 & Performance Evaluation Chapter 4 Introduction to Performance Evaluation