Provenance -Only Integration Ashish Gehani Dawood Tariq SRI - - PowerPoint PPT Presentation

provenance only integration
SMART_READER_LITE
LIVE PREVIEW

Provenance -Only Integration Ashish Gehani Dawood Tariq SRI - - PowerPoint PPT Presentation

Provenance -Only Integration Ashish Gehani Dawood Tariq SRI Provenance -Only Integration p. 1/13 Integration Challenges Metadata variation: Abstraction levels Completeness Identifiers Semantics Querying requires: Record assembly


slide-1
SLIDE 1

Provenance-Only Integration

Ashish Gehani Dawood Tariq SRI

Provenance-Only Integration – p. 1/13

slide-2
SLIDE 2

Integration Challenges

Metadata variation: Abstraction levels Completeness Identifiers Semantics Querying requires: Record assembly Reconciling syntax Mapping semantics

Provenance-Only Integration – p. 2/13

slide-3
SLIDE 3

Related Work

Data integration

AnHai Doan, Alon Halevy, and Zachary Ives, Principles of data integration, Elsevier, 2012.

Provenance integration Semantic web (Umuhoza 2012) Grid computing (Zhao 2008) System interoperability (Angelino 2011) Cross-organization sharing (Allen 2011)

Provenance-Only Integration – p. 3/13

slide-4
SLIDE 4

Provenance-Only Integration

Single underlying activity Multiple views of it Partial overlap in metadata

uid:0 pidname:syslogd ppid:1 starttime_simple:Thu May 22 18:24:41 2014 pid:21 type:Process user:root path:/private/var/log/asl/2014.06.10.U0.G80.asl filename:2014.06.10.U0.G80.asl type:Artifact version:1 (type:WasGeneratedBy) path:/private/etc/syslog.conf filename:syslog.conf type:Artifact version:0 (type:Used) gid:0 pid:21 type:Process path:/private/var/log/asl/2014.06.10.U0.G80.asl filename:2014.06.10.U0.G80.asl type:Artifact version:1 (time:1402396472245 type:WasGeneratedBy) path:/private/etc/syslog.conf filename:syslog.conf type:Artifact version:0 (time:1402400047469 type:Used)

Provenance-Only Integration – p. 4/13

slide-5
SLIDE 5

Speech Processing Hot Spots

Provenance-Only Integration – p. 5/13

slide-6
SLIDE 6

Basic Provenance-Only Integration

Provenance from two vantage points Need to integrate the two Approach: Define matching threshold τ Merge vertex pair if τ-similar Merge edge pair if τ-similar Cost from conflating owners Goal: Minimize τ Keep cost < tolerance Υ

Provenance-Only Integration – p. 6/13

slide-7
SLIDE 7

Android Provenance

Security analysis System-wide monitoring Resource-constrained Disrupts power management Blinded by garbage collection Multiple abstraction levels Kernel interface Inter-application (Binder) Provenance-only integration

Provenance-Only Integration – p. 7/13

slide-8
SLIDE 8

Android

by Alvaro Fuentes Vasquez via Wikimedia Commons (CC-BY-SA-3.0-2.5-2.0-1.0)

Provenance-Only Integration – p. 8/13

slide-9
SLIDE 9

Fast Integration

Integrate all τ-similar elements Don’t have to find matching pairs Avoids subgraph isomorphism problem Separate vertex, edge matching thresholds Thresholds are input now Cost is per match now Approach: Merge τv-similar vertices, if cost < Υ Merge τe-similar edges, if cost < Υ

Provenance-Only Integration – p. 9/13

slide-10
SLIDE 10

False Integration → High Cost

!" #!" $!" %!" &!" '!" (!" %" &" '" (" )" *" +" !"#$%&'(% )*+,#*"-.%&/(%

Provenance-Only Integration – p. 10/13

slide-11
SLIDE 11

Integration as Abstraction

!" #!" $!!" $#!" %!!" %#!" &!!" &#!" !" $" %" &" '" #" (" )" *" +" !"#$%&'()*&+,*"-.*'/& 0%"*'%123&+4/&

Provenance-Only Integration – p. 11/13

slide-12
SLIDE 12

Fidelity of Attribution

!" #!" $!" %!" &!" '!" (!" )!" *!" +!" #!!" !" #" $" %" &" '" (" )" *" +" !"#$%&&'$#()*'+,%"-$%&.' /0"%&0#12'+3.' ,"-"!" ,"-"#" ,"-"$" ,"-"."

Provenance-Only Integration – p. 12/13

slide-13
SLIDE 13

Conclusion

Provenance-only integration Basic form as constrained optimization Fast version → automated abstraction

Acknowledgement TaPP ’14 organizers, reviewers US NSF Grant IIS-1116414 URL: http://data-provenance.googlecode.com Email: ashish.gehani@sri.com Questions?

Provenance-Only Integration – p. 13/13