Interactive Knowledge Capture Yolanda Gil Director, Knowledge - - PowerPoint PPT Presentation

interactive knowledge capture
SMART_READER_LITE
LIVE PREVIEW

Interactive Knowledge Capture Yolanda Gil Director, Knowledge - - PowerPoint PPT Presentation

Interactive Knowledge Capture Yolanda Gil Director, Knowledge Technologies Associate Division Director for Research Research Professor, Computer Science Intelligent Systems Division Information Sciences Institute University of Southern


slide-1
SLIDE 1

1

USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Interactive Knowledge Capture

Yolanda Gil

Director, Knowledge Technologies Associate Division Director for Research Research Professor, Computer Science Intelligent Systems Division Information Sciences Institute University of Southern California http://www.isi.edu/~gil

slide-2
SLIDE 2

2

USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Knowledge Technologies at USC/ISI: Major Threads

slide-3
SLIDE 3

3

USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Semantic Workflows in Wings (http://wings.isi.edu)

[Gil et al JETAIʼ11; Gil et al IEEE-ISʼ11; Gil et al e-Scienceʼ09; Kim et al JWSʼ08]

Semantic descriptions of datasets (RDF, OWL)

  • Metadata

properties

Semantic constraints

  • How

computations transform the data

Automatic propagation of constraints

  • Assistance
  • Parameters
  • Algorithms
  • Generation
  • Validation

Execution in grids or clouds

Example: Workflow for Pixel Intensity Quantification

  • f brain imagery [Kumar

et al 10; Kurc et al 09]

Compact workflow template (left)

Automatically generated executable workflow for 2560x2400 pixels (right)

Unique capability to reason about application tasks and data, has uses in science and intel

slide-4
SLIDE 4

4

USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

LinkedDataLens: Extracting Networks of Interest from 25B+ Linked Data Cloud [Groth and Gil ʼ11]

Web of Data: 25B RDF triples (statements) with 395M links from 203 data sets [Berners-Lee et al 09]

  • Community-created through extraction

from web sources – News sources, events, geospatial information, bioinformatics, academic 

LinkedDataLens: a system to extract networks of interest

  • Framework accessible over the web, no

need to install any software

  • Workflows extract RDF triples through

queries, create network, and use social network analysis algorithms to extract interesting statistics – Size, centrality, connected components, etc.

  • Extracted networks can be integrated

with other existing networks and used by other applications – Networks about people, places, events, etc Eg, Pharmaceutical firms doing clinical

trials in California for same drug

A growing large structured source of data that can be exploited in many application areas

Our group contributed BibBase

Honorary Mention Triplification Challenge 2010

Pharmaceutical companies who make the same drug

Nodes=609 Edges=3032 density=0.016 AvgClusterCoeff=0.58 4 isConnected=False NbConnectedComp=1 7

US Semators who share Alma Mater Nodes=136 Edges=340 density=0.037 AvgClusterCoeff=0.620 3 isConnected=False NConnectedComp=17

slide-5
SLIDE 5

5

USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Social Knowledge Collection for Communities of Interest

(http://www.isi.edu/ikcap/shortipedia) [Vrandecic et al 2010; Vrandecic et al 2011]

Why: Social content collection tools are either lightly structured or too rigid

  • Wikis and collaborative tools provide community

repositories, but are not structured or aggregated in a searchable manner

  • Use of pre-defined schema/ontology that

community fills out with contributions

Semantic wiki is a framework that enables contributors to define organic characterizations that lead to emerging unified models

  • Users define vocabulary/ontology

– Voluntarily adopt definitions by others

  • Formal queries can retrieve structured content
  • Can use proactive normalization techniques to

encourage consensus where possible

Provenance-aware semantic wiki

  • Alternative views can be accommodated

Eg, Android is “jailbreakable” and “bricked” by RF software

  • Structured provenance records

– Document sources and evidence – Can filter query results according to provenance Eg,Software that bricks phones according to NIST

Community- created structured content Emerging semantics lead to dynamically aggregated content

Unique capability for social collection of structured content, with documented provenance

3rd Place Semantic Web Challenge 2010

slide-6
SLIDE 6

6

USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Unifying Provenance Models for Trusted Systems

[Groth et al ʼ11; Sahoo et al ʼ11; Moreau et al ʼ10]

Why: Provenance is rarely captured, hampering trust assessment

  • When captured, it is diverse in its nature and

underlying implementation

  • Document-based: where information was

found/extracted (e.g., NYT)

  • Attribution-based: who created the

information

  • Process-based: how information was derived

from documents or datasets 

Unifying models of provenance

  • Can assess trust in content (existing trust

models are attribution-based only) 

Builds on open standards

  • Semantic Web standards (OWL, RDF)
  • Open Provenance Model
  • Dublin Core

Unifying provenance infrastructure

  • Access provenance records across

heterogeneous systems to assess trust

  • Integrate data taking provenance records into

account 

Unifying provenance models would enable the computation of trust metrics for content

Mappings across existing provenance vocabularies Emerging standards (Open Provenance Model)

Chaired W3C Provenance Group (2009-2010):

Mappings across 10 popular provenance vocabularies

Use cases and requirements

Charter for a Working Group with 17 core concepts