Aligning and Integrating Data in Karma
Craig Knoblock University of Southern California
Aligning and Integrating Data in Karma Craig Knoblock University - - PowerPoint PPT Presentation
Aligning and Integrating Data in Karma Craig Knoblock University of Southern California Data Integration Approaches Data Integration Approaches Data Warehousing 3 Data Integration Approaches Data Warehousing Virtual Integration 4 Domain
Craig Knoblock University of Southern California
3
Data Warehousing
4
Data Warehousing Virtual Integration
5 Domain Model
6 Domain Model Source Mappings
8
Hierarchica l Sources Services
Karma
Tabular Sources
Database RDF
… Interactive tool for rapidly extracting, cleaning, transforming, integrating and publishing data
CSV http://www.isi.edu/integration/karma @KarmaSemWeb
10
Domain Model Source Mappings Karma Samples of Source Data
11
Domain Model Karma Samples of Source Data Source Mappings
12
Domain Model Source Mappings Karma Samples of Source Data
Karma semi-automatically builds a semantic model of your data
Semantic Model
13
Source
data property subClassOf
Domain Model
Person Organization Place State name birthdate bornIn worksFor state name phone name livesIn City Event ceo location
nearby startDate title isPartOf postalCode
name date city state workplace 1 Fred Collins Oct 1959 Seattle WA Microsoft 2 Tina Peterson May 1980 New York NY Google
Describe sources using classes & relationships in an ontology
Person
Organization
City State
name birthdate name name name
14
Person
name date city state workplace 1 Fred Collins Oct 1959 Seattle WA Microsoft 2 Tina Peterson May 1980 New York NY Google
Organization
City State
name birthdate name name name
15
Person
name date city state workplace 1 Fred Collins Oct 1959 Seattle WA Microsoft 2 Tina Peterson May 1980 New York NY Google
bornIn worksFor state
Organization
City State
name birthdate name name name
16
Person
name date city state workplace 1 Fred Collins Oct 1959 Seattle WA Microsoft 2 Tina Peterson May 1980 New York NY Google
bornIn worksFor state
Key ingredient to automate source discovery, data integration, and publishing semantic data (RDF triples) Semantic models will be formalized as Source Mappings
so what?
Karma uses semantic models to create knowledge graphs
Karma uses semantic models to create knowledge graphs Karma semi-automatically builds semantic models
Karma uses semantic models to create knowledge graphs Karma semi-automatically builds semantic models … and provides a nice GUI to edit them
[Knoblock et al, ESWC 2012] 22
Domain Ontology Learn Semantic Types Extract Relationships Steiner Tree Sample Data Construct a Graph
Source
data property subClassOf
Domain Ontology
23
name date city state workplace 1 Fred Collins Oct 1959 Seattle WA Microsoft 2 Tina Peterson May 1980 New York NY Google
Find a semantic model for the source (map the source to the ontology)
[Krishnamurthy et al., ESWC 2015] 24
class? property ?
25
CulturalHeritageObject extent
1- User specifies 2- System learns
CulturalHeritageObject
26
extent
CulturalHeritageObject CulturalHeritageObject
27
extent extent
number of semantic types
28
labeled documents
documents
Similarity between TF/IDF vectors
29
30
31
different semantic types is different, e.g., temperature vs. population
Testing to see which distribution fits best
U-test and Kolmogorov- Smirnov Test
32
Similiarity Features Attribute names similarity Jaccard Value Similarity TF-IDF Jaccard Distribution Similarity Mann- Whitney test Kolmogorov- Smirnov test Histogram Similarity Mann- Whitney test
[Pham et al., ISWC 2016]
Construct a graph from semantic types and ontology
38
Person Organization City State name birthdate name name name Person name date city state workplace 1 Fred Collins Oct 1959 Seattle WA Microsoft 2 Tina Peterson May 1980 New York NY Google
Construct a graph from semantic types and ontology
date
graph
40
Select minimal tree that connects all semantic types
42
date
43
44
Impose constraints on Steiner Tree Algorithm
– Change weight of selected links to ε – Add source and target of selected link to Steiner nodes
date
45
Taheriyanet al., ISWC 2013, ICSC 2014
Domain Ontology Learn Semantic Types Sample Data Construct a Graph Generate Candidate Models Rank Results Known Semantic Models
Pedro Szekely and Craig Knoblock University of Southern California
Domain Model Source Mappings
Karma
Domain Expert
Mapping Phase
Pedro Szekely and Craig Knoblock University of Southern California
Samples of Source Data
Domain Model Source Mappings
Karma
Samples of Source Data Domain Expert
Mapping Phase Karma Runtime System Query Phase
Analyst
Query
Virtual Integration Data Warehousing
Pedro Szekely and Craig Knoblock University of Southern California
researcher networks across institutions
about USC faculty to VIVO
RDF
50
44000 museum objects to Linked Open Data
Data Model (EDM)
DBPedia, ULAN, NY Times Linked Data
51
Insight Graphs
knowledge graphs to combat human trafficking
extracted data and structured sources to shared domain ontology
53
Using Karma to map museum data to the CIDOC CRM ontology
55 https://www.youtube.com/watch?v=h3_yiBhAJIc
sources
automate many tasks, e.g.,
Mohsen Taheriyan University of Southern California