supporting data interlinking in semantic libraries with
play

Supporting Data Interlinking in Semantic Libraries with Microtask - PowerPoint PPT Presentation

Supporting Data Interlinking in Semantic Libraries with Microtask Crowdsourcing Cristina Sarasua SWIB 2014, Bonn Institute for Web Science and Technologies University of Koblenz-Landau, Germany Cristina Sarasua Supporting Data


  1. Supporting Data Interlinking in Semantic Libraries with Microtask Crowdsourcing Cristina Sarasua SWIB 2014, Bonn Institute for Web Science and Technologies · University of Koblenz-Landau, Germany

  2. Cristina Sarasua Supporting Data Interlinking in Semantic Libraries with Microtask Crowdsourcing 2

  3. relation b a MARC 21 EDM FRBR Cristina Sarasua Supporting Data Interlinking in Semantic Libraries with Microtask Crowdsourcing 3

  4. relation b a MARC 21 EDM FRBR Cristina Sarasua Supporting Data Interlinking in Semantic Libraries with Microtask Crowdsourcing 4

  5. Please share your thoughts on interlinking! https://etherpad.mozilla.org/4IfZDaTBIe Cristina Sarasua Supporting Data Interlinking in Semantic Libraries with Microtask Crowdsourcing 5

  6. https://etherpad.mozilla.org/4IfZDaTBIe Interlinking on the Web of Data Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/ Cristina Sarasua Supporting Data Interlinking in Semantic Libraries with Microtask Crowdsourcing 6

  7. https://etherpad.mozilla.org/4IfZDaTBIe Cross-dataset links D1 (a,r,b) | a in D1, b in D2 D2 d1:timbl owl:sameAs d2:timbernerslee; d1:timbl owl:sameAs d2:timbernerslee; d1:donostia owl:sameAs d2:sansebastian; d1:donostia owl:sameAs d2:sansebastian; d1:bjork dc:creator d2:volta; d1:bjork dc:creator d2:volta; d1:Bonn wgs84:location d2:Germany; d1:Bonn wgs84:location d2:Germany; d1:work2012 o:inspiredBy d2:song1900; d1:work2012 o:inspiredBy d2:song1900; o1:Conference owl:equivalentClass o2:Congress; o1:Conference owl:equivalentClass o2:Congress; o1:Democracy skos:related o2:Government; o1:Democracy skos:related o2:Government; o1:Publication skos:broader o2:JournalArticle; o1:Publication skos:broader o2:JournalArticle; o1:ImpressionistPainting rdfs:subClassOf o2:Painting; o1:ImpressionistPainting rdfs:subClassOf o2:Painting; Cristina Sarasua Supporting Data Interlinking in Semantic Libraries with Microtask Crowdsourcing 7

  8. https://etherpad.mozilla.org/4IfZDaTBIe Why is interlinking important? What is known about Berlin? What is known about Berlin?  Enhance the x:berlin owl:sameAs x:berlin owl:sameAs description of local dbpedia:Berlin; dbpedia:Berlin; entities tour:berlin; tour:berlin; x:berlin o:homeOf x:berlin o:homeOf authors:berlin; authors:berlin; x:img09112014 x:img09112014  Richer queries over lode:atPlace geo:brandtor; lode:atPlace geo:brandtor; aggregated data SELECT ?city SELECT ?city WHERE { WHERE { ?city1 gov:population ?pop . ?city1 gov:population ?pop .  Cross-data set ?city1 owl:sameAs ?city2 . ?city1 owl:sameAs ?city2 . ?city2 unesco:count ?mon . ?city2 unesco:count ?mon . browsing FILTER (?pop > 1000000 FILTER (?pop > 1000000 ?mon > 50)} ?mon > 50)} http://www.w3.org/2005/Incubator/lld/XGR-lld-20111025/ Cristina Sarasua Supporting Data Interlinking in Semantic Libraries with Microtask Crowdsourcing 8

  9. https://etherpad.mozilla.org/4IfZDaTBIe Generating links Identify the D2 D1 resources to be connected with relation R Comparison criteria Picture: https://www.assembla.com/spaces/silk/wiki/Managin g_Reference_Links Decision boundary between link and non-link Cristina Sarasua Supporting Data Interlinking in Semantic Libraries with Microtask Crowdsourcing 9

  10. He is already busy Attribution: Thomas Leu Cristina Sarasua Supporting Data Interlinking in Semantic Libraries with Microtask Crowdsourcing 10

  11. He is already busy … but still would like correct and useful links Attribution: Thomas Leu Cristina Sarasua Supporting Data Interlinking in Semantic Libraries with Microtask Crowdsourcing 11

  12. Crowdsourced Interlinking Cristina Sarasua Supporting Data Interlinking in Semantic Libraries with Microtask Crowdsourcing 12

  13. Crowdsourcing “Crowdsourcing represents the act of a company or institution taking a function once performed by employees and outsourcing it to an undefined (and generally large) network of people in the form of an open call ” Jeff Howe, 2006 Scalable Fast Macrotask Microtask Contest-based Macrotask Microtask Contest-based Citizen Science Citizen Science crowdsourcing crowdsourcing crowdsourcing crowdsourcing crowdsourcing crowdsourcing -E.g. NLP algorithm for a -E.g. tweet sentiment -E.g. writing an E-Book particular challenging -E.g. classify galaxies in analysis -Months, $30per hour / scenario pictures -Seconds, reward cents hundreds or thousands of -Months, up to thousands - seconds/minutes, no -Crowd workers register dollars of dollards money with simple profile, limited -Freelancers recruitment, -Final evaluation and - Open to everyone filtering interviews winner selection Cristina Sarasua Supporting Data Interlinking in Semantic Libraries with Microtask Crowdsourcing 13

  14. An interlinking microtask Cristina Sarasua Supporting Data Interlinking in Semantic Libraries with Microtask Crowdsourcing 14

  15. An interlinking microtask Cristina Sarasua Supporting Data Interlinking in Semantic Libraries with Microtask Crowdsourcing 15

  16. An interlinking microtask Cristina Sarasua Supporting Data Interlinking in Semantic Libraries with Microtask Crowdsourcing 16

  17. Approach Analyse crowd workers 2 D2 D1 Parse RDF links 1 Query D1,D2 Collect crowd responses for the candidate links to 3 be processed cl1: (s,p,o) cl1: (s,p,o) cl2: (s,p,o) cl2: (s,p,o) Generate … … and publish cln: (s,p,o) cln: (s,p,o) microtasks candidate links Aggregated response Collect Generate responses RDF file with final links 4 cl5: (s,p,o) cl5: (s,p,o) … … crowd interlinking cln: (s,p,o) cln: (s,p,o) Cristina Sarasua Supporting Data Interlinking in Semantic Libraries with Microtask Crowdsourcing 17

  18. Approach (II)  Analyse crowd workers to filter out people – With bad intentions (i.e. scammers) – Who do not have enough knowledge  Select representative links from which the answer is known (ground truth) and assess people → domain expert useful Measure x:b2 rdfs:label “Berlinale”; x:b rdfs:label “Berlin”; x:b2 rdfs:label “Berlinale”; x:b rdfs:label “Berlin”; difficulty based rdf:type o:Event; rdf:type o:City; rdf:type o:Event; rdf:type o:City; on data heuristics x:b rdfs:label “Córdoba”; x:b2 rdfs:label “Córdoba”; x:b rdfs:label “Córdoba”; x:b2 rdfs:label “Córdoba”; rdf:type o:City; rdf:type o:City; Select rdf:type o:City; rdf:type o:City; different x:b rdfs:label “Córdoba”; x:b rdfs:label “Córdoba”; matching x:b2 rdf:type o:City; x:b2 rdf:type o:City; rdf:type o:City; rdf:type o:City; cases wgs84:lat 37.883; wgs84:lat 37.883; wgs84:lat -31.400; wgs84:lat -31.400; Cristina Sarasua Supporting Data Interlinking in Semantic Libraries with Microtask Crowdsourcing 18

  19. Approach (II)  Analyse crowd workers to filter out people – With bad intentions (i.e. scammers) – Who do not have enough knowledge  Select representative links from which the answer is known (ground truth) and assess people → domain expert useful Measure x:b2 rdfs:label “Berlinale”; x:b rdfs:label “Berlin”; x:b2 rdfs:label “Berlinale”; x:b rdfs:label “Berlin”; difficulty based Two-way feedback rdf:type o:Event; rdf:type o:City; rdf:type o:Event; rdf:type o:City; on data heuristics x:b rdfs:label “Córdoba”; x:b2 rdfs:label “Córdoba”; x:b rdfs:label “Córdoba”; x:b2 rdfs:label “Córdoba”; rdf:type o:City; rdf:type o:City; Select rdf:type o:City; rdf:type o:City; different x:b rdfs:label “Córdoba”; x:b rdfs:label “Córdoba”; matching x:b2 rdf:type o:City; x:b2 rdf:type o:City; rdf:type o:City; rdf:type o:City; cases wgs84:lat 37.883; wgs84:lat 37.883; wgs84:lat -31.400; wgs84:lat -31.400; Cristina Sarasua Supporting Data Interlinking in Semantic Libraries with Microtask Crowdsourcing 19

  20. Approach Context information Analyse crowd workers 2 D2 D1 Parse RDF links 1 Query D1,D2 Collect crowd responses for the candidate links to 3 be processed cl1: (s,p,o) cl1: (s,p,o) cl2: (s,p,o) cl2: (s,p,o) Generate … … and publish cln: (s,p,o) cln: (s,p,o) microtasks candidate links Aggregated response Collect Generate responses RDF file with final links #workers per link 4 cl5: (s,p,o) cl5: (s,p,o) agreement … … crowd interlinking cln: (s,p,o) cln: (s,p,o) Cristina Sarasua Supporting Data Interlinking in Semantic Libraries with Microtask Crowdsourcing 20

  21. Approach (II) D1 D2 Manual interlinking D1 D2 Algorithm Review Guide HCOMP interlinking Cristina Sarasua Supporting Data Interlinking in Semantic Libraries with Microtask Crowdsourcing 21

  22. Use cases Cristina Sarasua Supporting Data Interlinking in Semantic Libraries with Microtask Crowdsourcing 22

  23. Mapping vocabularies Context information pre-configured Run an automatic ontology alignment tool and post-process the results with the crowd See also: [Sarasua et al., 2012] Cristina Sarasua Supporting Data Interlinking in Semantic Libraries with Microtask Crowdsourcing 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend