1
USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
Interactive Knowledge Capture Yolanda Gil Director, Knowledge - - PowerPoint PPT Presentation
Interactive Knowledge Capture Yolanda Gil Director, Knowledge Technologies Associate Division Director for Research Research Professor, Computer Science Intelligent Systems Division Information Sciences Institute University of Southern
1
USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
2
USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
3
USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
[Gil et al JETAIʼ11; Gil et al IEEE-ISʼ11; Gil et al e-Scienceʼ09; Kim et al JWSʼ08]
Semantic descriptions of datasets (RDF, OWL)
properties
Semantic constraints
computations transform the data
Automatic propagation of constraints
Execution in grids or clouds
Example: Workflow for Pixel Intensity Quantification
et al 10; Kurc et al 09]
Compact workflow template (left)
Automatically generated executable workflow for 2560x2400 pixels (right)
Unique capability to reason about application tasks and data, has uses in science and intel
4
USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
Web of Data: 25B RDF triples (statements) with 395M links from 203 data sets [Berners-Lee et al 09]
from web sources – News sources, events, geospatial information, bioinformatics, academic
LinkedDataLens: a system to extract networks of interest
need to install any software
queries, create network, and use social network analysis algorithms to extract interesting statistics – Size, centrality, connected components, etc.
with other existing networks and used by other applications – Networks about people, places, events, etc Eg, Pharmaceutical firms doing clinical
trials in California for same drug
A growing large structured source of data that can be exploited in many application areas
Our group contributed BibBase
Honorary Mention Triplification Challenge 2010
Pharmaceutical companies who make the same drug
Nodes=609 Edges=3032 density=0.016 AvgClusterCoeff=0.58 4 isConnected=False NbConnectedComp=1 7
US Semators who share Alma Mater Nodes=136 Edges=340 density=0.037 AvgClusterCoeff=0.620 3 isConnected=False NConnectedComp=17
5
USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
(http://www.isi.edu/ikcap/shortipedia) [Vrandecic et al 2010; Vrandecic et al 2011]
Why: Social content collection tools are either lightly structured or too rigid
repositories, but are not structured or aggregated in a searchable manner
community fills out with contributions
Semantic wiki is a framework that enables contributors to define organic characterizations that lead to emerging unified models
– Voluntarily adopt definitions by others
encourage consensus where possible
Provenance-aware semantic wiki
Eg, Android is “jailbreakable” and “bricked” by RF software
– Document sources and evidence – Can filter query results according to provenance Eg,Software that bricks phones according to NIST
Community- created structured content Emerging semantics lead to dynamically aggregated content
Unique capability for social collection of structured content, with documented provenance
3rd Place Semantic Web Challenge 2010
6
USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
[Groth et al ʼ11; Sahoo et al ʼ11; Moreau et al ʼ10]
Why: Provenance is rarely captured, hampering trust assessment
underlying implementation
found/extracted (e.g., NYT)
information
from documents or datasets
Unifying models of provenance
models are attribution-based only)
Builds on open standards
Unifying provenance infrastructure
heterogeneous systems to assess trust
account
Unifying provenance models would enable the computation of trust metrics for content
Mappings across existing provenance vocabularies Emerging standards (Open Provenance Model)
Chaired W3C Provenance Group (2009-2010):
Mappings across 10 popular provenance vocabularies
Use cases and requirements
Charter for a Working Group with 17 core concepts