A Simple Approach to Accurately Convert Tabular Data into Semantic - PowerPoint PPT Presentation

Apr 18, 2023 •10 likes •200 views

A Simple Approach to Accurately Convert Tabular Data into Semantic Knowledge Gilles Vandewiele prof. dr. Filip De Turck Bram Steenwinckel prof. dr. Femke Ongenae (PhD student) (assistant professor, promotor) (professor, promotor) (PhD

A Simple Approach to Accurately Convert Tabular Data into Semantic Knowledge Gilles Vandewiele prof. dr. Filip De Turck Bram Steenwinckel prof. dr. Femke Ongenae (PhD student) (assistant professor, promotor) (professor, promotor) (PhD student)
Problem statement
High-level overview
Phase 1: using lookups to create initial annotations → detect names & only use family names REGEX: "^(\w\. )+([\w\-']+)$" → disambiguation is done with Levenshtein distance for non-names & whoswho library for person names https://github.com/rliebz/whoswho
Phase 2: infer columns based on cell annotations col 0 SELECT ?t WHERE { <x 0,0 > a ?t . x 0,0 } ... x 0,n-1
Phase 3: infer properties based on cell annotations and disambiguate with column annotations SELECT ?p WHERE { Disambiguation: <x 0,0 > ?p <x 1,0 > . Look for domain & range in column types } SELECT ?domain ?range WHERE { col 0 col 1 <pred> rdfs:domain ?domain . <pred> rdfs:range ?range . x 0,0 x 1,0 } ... x 0,n-1 x 1,n-1
Phase 4: annotate the head cells with the properties SELECT ?s WHERE { ?s <pred> <x 1,0 > . → Take ?s with highest counts. In case } of ex aequo, use Levenshtein. col 0 col 1 ... col n-1 x 0,0 x 1,0 ... x n-1,0 ... ... x 0,n-1 x 1,n-1 ... x n-1,n-1
Phase 5: annotate all other cells SELECT ?o WHERE { <x 0,0 > <pred> ?o . } → Disambiguate with Levenshtein col 0 col 1 ... col n-1 x 0,0 x 1,0 ... x n-1,0 ... ... x 0,n-1 x 1,n-1 ... x n-1,n-1
Phase 6: final column annotation Higher quality cell annotations col 0 SELECT ?t WHERE { <x 0,0 > a ?t . x 0,0 } ... x 0,n-1
Some sly tricks to boost our score - Many names (e.g. G. Vandewiele, B. Steenwinckel) → custom code for these - CTA score is not bounded by 1! Add all the parents to the column annotation → Max score per row if perfect type is on depth d: 1 + (d - 1) * 0.5 - Reasoning to find equivalent classes and add these as well - Find tables that are very similar (in earlier rounds the CSV headers often matched) and apply majority voting
Things we tried, but didn’t work well Clustering of lookup candidates using jaccard distances between their rdf types.
Things we tried, but didn’t work well Playing around (outlier removal, clustering, …) with pre-made RDF2Vec embeddings for DBPedia https://github.com/IBCNServices/pyRDF2Vec
Results: Round 1 CTA
Results: Round 2 CEA CTA CPA
Results: Round 3 CEA CTA CPA
Results: Round 4 CEA CTA CPA
Conclusion & future work - We first tried more sophisticated approaches, they were all subpar → KISS - Simple approach performs really well (second place overall) - The iterative approach can easily be replaced by a better approach that jointly learns to annotate properties, column types and cells (keeping track of all possible candidates)
Thank you! gilles.vandewiele@ugent.be https://twitter.com/Gillesvdwiele https://www.linkedin.com/in/gillesvandewiele/ www.gillesvandewiele.com Paper: http://www.cs.ox.ac.uk/isg/challenges/sem-tab/papers/IDLab.pdf Code (WIP): https://github.com/IBCNServices/CSV2KG

Recommend

Differential Privacy Tabular Data Li Xiong Outline Tabular data and histogram/range

CS573 Data Privacy and Security Differential Privacy Tabular Data Li Xiong Outline Tabular data and histogram/range queries Algorithms for low dimensional data Algorithms for high dimensional data Example: statistics/synthetic

759 views • 55 slides

Locally tabular polymodal logics Ilya Shapirovsky Institute for Information Transmission Problems

Locally tabular polymodal logics Ilya Shapirovsky Institute for Information Transmission Problems of the Russian Academy of Sciences, Moscow June 30, 2017 Locally tabular (or locally finite ) logics A logic L is locally tabular if, for any

608 views • 46 slides

CENG 342 Digital Systems Tabular Minimization Larry Pyeatt SDSM&T Tabular Minimization

CENG 342 Digital Systems Tabular Minimization Larry Pyeatt SDSM&T Tabular Minimization Karnaugh Maps are good for up to six input variables, but cannot be extended beyond that. Karnaugh Maps are not easily implemented in a computer

447 views • 32 slides

CS573 Data Privacy and Security Differential Privacy tabular data and range queries Li Xiong

CS573 Data Privacy and Security Differential Privacy tabular data and range queries Li Xiong Outline Tabular data and histogram/range queries Algorithms for low dimensional data Algorithms for high dimensional data Example:

692 views • 57 slides

Aim I can convert metric measures involving length. Success Criteria I can convert from

Aim I can convert metric measures involving length. Success Criteria I can convert from millimetres to centimetres and vice versa. I can convert from centimetres to metres and vice versa. I can compare and order mixed metric

769 views • 23 slides

Mathematics 101: Tabular and Graphical Presentation of Data Olive R. Cawiding Department of

Mathematics 101: Tabular and Graphical Presentation of Data Olive R. Cawiding Department of Mathematics and Computer Science University of the Philippines Baguio Textual Presentation of Data Tabular Presentation of Data Graphical Presentation

672 views • 37 slides

Fast Mining of Massive Tabular Data via Approximate Distance Computations Graham Cormode, Piotr

Fast Mining of Massive Tabular Data via Approximate Distance Computations Graham Cormode, Piotr Indyk, Nick Koudas, S. Muthukrishnan Tabular Data Much data is stored in tables: Cellphone traffic IP traffic between source and

401 views • 20 slides

Expert 2D Shape Drawing Aim I can accurately draw a range of 2D shapes using the measurements

Expert 2D Shape Drawing Aim I can accurately draw a range of 2D shapes using the measurements given. Success Criteria I can follow instructions to accurately draw shapes. I can draw lines accurately using a ruler. I can draw

819 views • 18 slides

HOW TO CREATE LANDING PAGES THAT CONVERT FOR ORGANIC & PAID ADS LANDING PAGES THAT CONVERT

HOW TO CREATE LANDING PAGES THAT CONVERT FOR ORGANIC & PAID ADS LANDING PAGES THAT CONVERT Organic vs. Paid The Offer The Page Layout The Form The Follow Up 2 THE OFFER Provide value to your audience in exchange for

228 views • 19 slides

Simple Tabular Dataset Kaarel Sikk 2012 Project background * data in archaeology - a lot, uneven,

History of Estonian Archaeological Excavations. Geo- and Network Visualization of Simple Tabular Dataset Kaarel Sikk 2012 Project background * data in archaeology - a lot, uneven, not accessible * 2012 - list of archaeological excavations * no

437 views • 15 slides

DISCRETIZE: Command to Convert a Continuous Instrument into a Dummy Variable for Instrumental

DISCRETIZE: Command to Convert a Continuous Instrument into a Dummy Variable for Instrumental Variable Estimation DISCRETIZE: Command to Convert a Continuous Instrument into a Dummy Variable for Instrumental Variable Estimation Federico Curci,

876 views • 22 slides

DISCRETIZ: Command to Convert a Continuous Instrument into a Dummy Variable for Instrumental

DISCRETIZ: Command to Convert a Continuous Instrument into a Dummy Variable for Instrumental Variable Estimation DISCRETIZ: Command to Convert a Continuous Instrument into a Dummy Variable for Instrumental Variable Estimation ebastien Fontenay 2

341 views • 21 slides

Case Study We Convert Problematic Biomass Waste Into Solid Biofuel Creating Communities

Case Study We Convert Problematic Biomass Waste Into Solid Biofuel Creating Communities Embracing Clean Energy How has this come along? Think of fundraising like a Flask instead of a Funnel - Convert Your Advocates into a Fundraiser! How

443 views • 4 slides

C Language Elements CSCI 112: Programming in C A simple program to convert miles to kilometers

C Language Elements CSCI 112: Programming in C A simple program to convert miles to kilometers Ask the user for a number of miles Convert that number to kilometers Display the result to the user #include <stdio.h> #define

828 views • 29 slides

Classification of curves Simple, not closed Simple, closed Closed, not simple Not simple, not

Classification of curves Simple, not closed Simple, closed Closed, not simple Not simple, not closed MA202 Sections 5 & 401 Chapter 11-2 Slides Interior and exterior of a simple closed curve MA202 Sections 5 & 401 Chapter 11-2

490 views • 36 slides

MDS 3.0 Provider Updates May 2013 Purpose of Training To educate on how to accurately code the

MDS 3.0 Provider Updates May 2013 Purpose of Training To educate on how to accurately code the MDS 3.0 and submit resident information. To identify the importance of coding accurately. Topics: Discharge Assessments Use

377 views • 23 slides

MOBILITY TRENDS AND COVID-19 IN Northwestern University THE CITY OF CHICAGO Transportation

MOBILITY TRENDS AND COVID-19 IN Northwestern University THE CITY OF CHICAGO Transportation Center PREPARED BY DIVYAKANT THALYAN AND HANI S. MAHMASSANI CONTACT: DR. HANI S. MAHMASSANI-- MASMAH@NORTHWESTERN.EDU AVAILABLE DATA FROM THE CITY OF

602 views • 9 slides

Indirect Searches for Dark Matter with CTA Brian Humensky, for the CTA Consortium Columbia

Aquarius, Springel et al. arXiv:0809.0898 Indirect Searches for Dark Matter with CTA Brian Humensky, for the CTA Consortium Columbia University Cosmic Visions, University of Maryland March 24, 2017 Dark Matter Cherenkov Telescope Array

812 views • 32 slides

CTA Design Study CTA Design Study - Swiss Hardware Contributions - Swiss Hardware Contributions

CTA Design Study CTA Design Study - Swiss Hardware Contributions - Swiss Hardware Contributions Isabel Braun Insttut for Partcle Physics, ETH Zrich CHIPP Plenary Meeting 2009, Appenberg Participating Institutes Participating Institutes

442 views • 16 slides

Top-k Queries over Uncertain Scores Qing Liu, Debabrota Basu, Talel Abdessalem, St ephane

Top-k Queries over Uncertain Scores Top-k Queries over Uncertain Scores Qing Liu, Debabrota Basu, Talel Abdessalem, St ephane Bressan CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 1 / 19 Top-k Queries over Uncertain Scores

779 views • 57 slides

Parish Evangelization Process Play Bar Hi, Im JoEllen CTA Link Executive Director LIGHT

CTA Link Parish Evangelization Process Play Bar Hi, Im JoEllen CTA Link Executive Director LIGHT OF THE WORLD EVANGELIZATION MINISTRIES Play Bar Can you relate? CTA Link Low attendance Checking the boxes Maintenance

885 views • 84 slides

Astroparticle Physics R-ECFA Meeting, Wissenschaftszentrum, Bonn, May 9, 2014 Astroparticle

Astroparticle Physics R-ECFA Meeting, Wissenschaftszentrum, Bonn, May 9, 2014 Astroparticle Physics in Germany Astrophysics Cosmology Cosmic Rays Astro- High Energy Gamma Rays particle Ultra High Energy Neutrinos Physics Dark Matter

757 views • 30 slides

Massimo Persic INAF+INFN Trieste for CTA Consortium Merate, Oct 6, 2011 CTA Ground-Based

CTA Massimo Persic INAF+INFN Trieste for CTA Consortium Merate, Oct 6, 2011 CTA Ground-Based gamma-ray astronomy Physics questions left by the current instruments The Cherenkov Telescope Array Sensitivity Requirements Current

688 views • 44 slides

The European Southern Observatory Roberto Gilmozzi, ESO Deputy director of Programmes APPEC, 7

The European Southern Observatory Roberto Gilmozzi, ESO Deputy director of Programmes APPEC, 7 Apr 2016 1 European Southern Observatory 1962 ESO created by five countries with the goal to build a large telescope in the southern

305 views • 14 slides