On Data Dependencies in Dataspaces Shaoxu Song Tsinghua University - PowerPoint PPT Presentation

On Data Dependencies in Dataspaces Shaoxu Song Tsinghua University This is a joint work with Lei Chen (HKUST) and Philip S. Yu (UIC) sxsong@tsinghua.edu.cn 2011

On Data Dependencies in Dataspaces Introduction 1/24 Shaoxu Song sxsong@tsinghua.edu.cn Dataspaces provide a co-existing system of heterogeneous data consider three levels of elements, object : { ( attribute : value ) } Example We consider a dataspace with following objects, t 1 : { ( name : iPod ) , ( color : red ) , ( manu : Apple Inc . ) , ( tel : 567 ) , ( addr : InfiniteLoop , CA ) , ( website : itunes . com ) } ; t 2 : { ( name : iPod ) , ( color : cardinal ) , ( prod : Apple ) , ( tel : 123 ) , ( post : InfiniteLoop , Cupert ) , ( website : apple . com ) } ; t 3 : { ( name : iPad ) , ( color : white ) , ( manu : Apple Inc . ) , ( post : InfiniteLoop ) , ( website : apple . com ) , ( phn : 567 ) } .

On Data Dependencies in Dataspaces Introduction 2/24 Shaoxu Song sxsong@tsinghua.edu.cn Comparable Correspondence Relationship between elements in heterogeneous data metric operator ‘ manu ≈ ≤ 5 prod ’ any two respective values of manu and prod are said comparable, e.g., Apple Inc and Apple , if their edit distance is ≤ 5. matching operator ‘ color ⇋ color ’ e.g., red and cardinal are said matched as comparable color , via users’ feedback often incrementally recognized in a pay-as-you-go style A query of ( manu : Apple ) search value similar to Apple in both manu and prod e.g., ( manu : Apple Inc . ) in t 1 and ( prod : Apple ) in t 2

On Data Dependencies in Dataspaces Introduction 3/24 Shaoxu Song sxsong@tsinghua.edu.cn Data Dependencies For wider applications integrity constraints, schema design optimizing query evaluation, capturing data inconsistency, removing data duplicates Conventional data dependencies not directly applicable to dataspaces often defined on the equality function functional dependencies ( FD s), X → A specify the constraint of equality between the values of two objects on the same attribute e.g., manu → addr cannot address the comparable correspondence, in ( manu , prod ) or ( addr , post )

On Data Dependencies in Dataspaces Introduction 4/24 Shaoxu Song sxsong@tsinghua.edu.cn Comparable Function Specify constraints on comparable attributes θ ( manu , prod ) : [ manu ≈ ≤ 5 manu , manu ≈ ≤ 5 prod , prod ≈ ≤ 5 prod ] Two objects are said comparable on ( manu , prod ) if at least one of these three comparison operators in θ ( manu , prod ) is applicable. t 1 , t 2 are comparable on ( manu , prod ) , since edit distance of ( t 1 [ manu ] , t 2 [ prod ]) is ≤ 5 t 1 , t 3 are also comparable on ( manu , prod ) , where ( t 1 [ manu ] , t 3 [ manu ]) satisfy ‘ manu ≈ ≤ 5 manu ’ t 1 : { ( name : iPod ) , ( color : red ) , ( manu : Apple Inc . ) , ( tel : 567 ) , ( addr : InfiniteLoop , CA ) , ( website : itunes . com ) } ; t 2 : { ( name : iPod ) , ( color : cardinal ) , ( prod : Apple ) , ( tel : 123 ) , ( post : InfiniteLoop , Cupert ) , ( website : apple . com ) } ; t 3 : { ( name : iPad ) , ( color : white ) , ( manu : Apple Inc . ) , ( post : InfiniteLoop ) , ( website : apple . com ) , ( phn : 567 ) } .

On Data Dependencies in Dataspaces Introduction 5/24 Shaoxu Song sxsong@tsinghua.edu.cn Comparable Dependencies (CDs) A general form of dependencies on comparable functions ϕ 1 : θ ( manu , prod ) → θ ( addr , post ) if the manu or prod values of two products are comparable then their corresponding addr or post values should also be comparable where θ ( addr , post ) : [ addr ≈ ≤ 9 addr , addr ≈ ≤ 9 post , post ≈ ≤ 9 post ] is another comparable function

On Data Dependencies in Dataspaces Introduction 6/24 Shaoxu Song sxsong@tsinghua.edu.cn Application Example Query optimization consider an object t 1 as the query to query objects having values similar to ( manu : Apple Inc . ) and ( addr : InfiniteLoop , CA ) of t 1 search in the manu , addr attributes specified in the query, also search in the comparable attributes prod , post according to the comparable functions θ ( manu , prod ) and θ ( addr , post ) according to ϕ 1 , rewrite the query by using ( manu , prod ) only t 1 : { ( name : iPod ) , ( color : red ) , ( manu : Apple Inc . ) , ( tel : 567 ) , ( addr : InfiniteLoop , CA ) , ( website : itunes . com ) } ; t 2 : { ( name : iPod ) , ( color : cardinal ) , ( prod : Apple ) , ( tel : 123 ) , ( post : InfiniteLoop , Cupert ) , ( website : apple . com ) } ; t 3 : { ( name : iPad ) , ( color : white ) , ( manu : Apple Inc . ) , ( post : InfiniteLoop ) , ( website : apple . com ) , ( phn : 567 ) } .

On Data Dependencies in Dataspaces Introduction 7/24 Shaoxu Song sxsong@tsinghua.edu.cn Related Work Metric functional dependencies ( MFD s) X δ − → A equality operator in the left-hand-side similarity operator in the right-hand-side for violation detection e.g., manu 2 − → addr Matching dependencies ( MD s) [ X ≈ X ] → [ A ⇋ A ] similarity operator in the left-hand-side matching operator in the right-hand-side for record matching e.g., [ addr ≈ addr ] → [ tel ⇋ tel ]

Outline Introduction Definition Validation Discovery Experiment Conclusion

On Data Dependencies in Dataspaces Definition 8/24 Shaoxu Song sxsong@tsinghua.edu.cn Comparison Operator We consider a general form of comparison operators, which include the previous operators. Let A i ↔ ij A j denote a comparison operator between two attributes A i , A j in a dataspace S equality operator A i = A j in functional dependencies ( FD s) metric operator A i ≈ λ A j in metric functional dependencies ( MFD s) matching operator A i ⇋ A j in matching dependencies ( MD s) The comparision operator indicates true, if two values satisfy the corresponding constraint.

On Data Dependencies in Dataspaces Definition 9/24 Shaoxu Song sxsong@tsinghua.edu.cn Syntex A general comparable function θ ( A i , A j ) : [ A i ↔ ii A i , A i ↔ ij A j , A j ↔ jj A j ] specifies a comparable constraint of two values from attribute A i or A j , according to their corresponding comparison operators. A comparable dependency ( CD ) ϕ with general comparable functions over a dataspace S is in the form of � ϕ : θ ( A i , A j ) → θ ( B 1 , B 2 ) If two objects have comparable values on A i or A j , then they must have comparable values on B 1 or B 2 .

On Data Dependencies in Dataspaces Definition 10/24 Shaoxu Song sxsong@tsinghua.edu.cn Example Consider ϕ 4 : θ ( manu , prod ) → θ ( tel , phn ) where θ ( tel , phn ) is [ tel = tel , tel = phn , phn = phn ] we have ( t 1 , t 3 ) ≍ LHS ( ϕ 4 ) also agree ( t 1 , t 3 ) ≍ RHS ( ϕ 4 ) denoted by ( t 1 , t 3 ) � ϕ 4 . t 1 : { ( name : iPod ) , ( color : red ) , ( manu : Apple Inc . ) , ( tel : 567 ) , ( addr : InfiniteLoop , CA ) , ( website : itunes . com ) } ; t 2 : { ( name : iPod ) , ( color : cardinal ) , ( prod : Apple ) , ( tel : 123 ) , ( post : InfiniteLoop , Cupert ) , ( website : apple . com ) } ; t 3 : { ( name : iPad ) , ( color : white ) , ( manu : Apple Inc . ) , ( post : InfiniteLoop ) , ( website : apple . com ) , ( phn : 567 ) } .

On Data Dependencies in Dataspaces Definition 11/24 Shaoxu Song sxsong@tsinghua.edu.cn Approximate Dependencies Due to the extremely high heterogeneity, data dependencies might not exactly hold in a given dataspace. ϕ 4 : θ ( manu , prod ) → θ ( tel , phn ) , e.g., ( t 1 , t 2 ) ≍ LHS ( ϕ 4 ) but ( t 1 , t 2 ) �≍ RHS ( ϕ 4 ) i.e., ( t 1 , t 2 ) � � ϕ 4 t 1 : { ( name : iPod ) , ( color : red ) , ( manu : Apple Inc . ) , ( tel : 567 ) , ( addr : InfiniteLoop , CA ) , ( website : itunes . com ) } ; t 2 : { ( name : iPod ) , ( color : cardinal ) , ( prod : Apple ) , ( tel : 123 ) , ( post : InfiniteLoop , Cupert ) , ( website : apple . com ) } ; t 3 : { ( name : iPad ) , ( color : white ) , ( manu : Apple Inc . ) , ( post : InfiniteLoop ) , ( website : apple . com ) , ( phn : 567 ) } .

On Data Dependencies in Dataspaces Definition 12/24 Shaoxu Song sxsong@tsinghua.edu.cn Measure To evaluate how a dependency “almost” holds in a data instance Error measure g 3 ( ϕ, S ) = |S| − max {| T | | T ⊆ S , T � ϕ } , |S| the minimum number of objects that have to be removed from the dataspace S for a dependency ϕ to hold. Confidence measure conf ( ϕ, S ) = max {| T | | T ⊆ S , T � ϕ } . |S| the maximum number of objects reserved after removing minimum objects of violations with respect to ϕ .

On Data Dependencies in Dataspaces Definition 13/24 Shaoxu Song sxsong@tsinghua.edu.cn Example ϕ 4 : θ ( manu , prod ) → θ ( tel , phn ) , Error measure { t 2 } is a minimum violation set w.r.t. ϕ 4 such that all the remaining objects { t 1 , t 3 } satisfy ϕ 4 g 3 = 1 / 3 Confidence measure { t 1 , t 3 } a maximum keeping set w.r.t. ϕ 4 conf = 2 / 3 t 1 : { ( name : iPod ) , ( color : red ) , ( manu : Apple Inc . ) , ( tel : 567 ) , ( addr : InfiniteLoop , CA ) , ( website : itunes . com ) } ; t 2 : { ( name : iPod ) , ( color : cardinal ) , ( prod : Apple ) , ( tel : 123 ) , ( post : InfiniteLoop , Cupert ) , ( website : apple . com ) } ; t 3 : { ( name : iPad ) , ( color : white ) , ( manu : Apple Inc . ) , ( post : InfiniteLoop ) , ( website : apple . com ) , ( phn : 567 ) } .

Outline Introduction Definition Validation Discovery Experiment Conclusion

On Data Dependencies in Dataspaces Shaoxu Song Tsinghua University - PowerPoint PPT Presentation

On Data Dependencies in Dataspaces Shaoxu Song Tsinghua University This is a joint work with Lei Chen (HKUST) and Philip S. Yu (UIC) sxsong@tsinghua.edu.cn 2011 On Data Dependencies in Dataspaces Introduction 1/24 Shaoxu Song

iTrails: Pay-as-you-go Information Integration in Introduction Data & Query Dataspaces

Building stuff with monadic dependencies + unchanging dependencies + polymorphic dependencies +

Task Dependencies: ant Steven J Zeil February 25, 2013 Task Dependencies: ant Outline

iTrails: Pay-as-you-go Information Integration in Dataspaces Marcos Vaz Salles Jens Dittrich

Dependencies in Interval- -valued valued Dependencies in Interval Symbolic Data Symbolic Data

Dependencies and Hazards Lecture 17 CS301 Data Dependencies We want to keep the pipeline

AngularJS Dependencies and Services Dependencies & Services App can get cluttered if all

Managing Dependencies and Runtime Security ActiveState Deminar Managing Dependencies and

Attribute Dependencies Wilhelm/Seidl/Hack: Compiler Design, Syntactic and Semantic Analysis

Nuking nasty memory leaks Pierre-Yves Ricau Pierre-Yves Ricau dependencies { }A dependencies

Functional Dependencies 1 Functional Dependencies X Y is an assertion about a relation

Bridging the Gap between Data Diversity and Data Dependencies Jean-Marc Petit INSA Lyon,

ConlluEditor: a fully graphical editor for Universal dependencies treebank files Johannes

Extending Dependencies with Conditions Loreto Bravo University of Edinburgh Wenfei Fan

Measuring and identifying human behaviors Technical dependencies important but Altmetrics

Generating Precise Dependencies for Large Software Pei Wang, Jinqiu Yang, Lin Tan University of

PCP Working Group Thursday 11 th November 2010 Stuart Cheshire, Apple Inc. 1 PCP Design

Light-Matter Correlations in Polariton Condensates 1) Alexey Kavokin University of Southampton,

Outline Diagnosis and assessment Contraception Lupus for the Internist: Cardiovascular

Thermodynamic analysis of irreversibilities in thin heat conducting films Federico Vzquez (1)

Mobile JavaScript Development or HTML5 apps Nikolai Onken - uxebu Consulting Ltd. & Co. KG

Lecture Overview What is Artificial Intelligence? Agents acting in an environment Learning

Computer Graphics CPSC 453 Fall 2018 Sonny Chan Your Professor Dr. Sonny Chan -

@MilanGabor Dont be afraid to bug me! I dont bite! ;)

On Data Dependencies in Dataspaces Shaoxu Song Tsinghua University - PowerPoint PPT Presentation

On Data Dependencies in Dataspaces Shaoxu Song Tsinghua University This is a joint work with Lei Chen (HKUST) and Philip S. Yu (UIC) sxsong@tsinghua.edu.cn 2011 On Data Dependencies in Dataspaces Introduction 1/24 Shaoxu Song

iTrails: Pay-as-you-go Information Integration in Introduction Data &amp; Query Dataspaces

Building stuff with monadic dependencies + unchanging dependencies + polymorphic dependencies +

Task Dependencies: ant Steven J Zeil February 25, 2013 Task Dependencies: ant Outline

iTrails: Pay-as-you-go Information Integration in Dataspaces Marcos Vaz Salles Jens Dittrich

Dependencies in Interval- -valued valued Dependencies in Interval Symbolic Data Symbolic Data

Dependencies and Hazards Lecture 17 CS301 Data Dependencies We want to keep the pipeline

AngularJS Dependencies and Services Dependencies &amp; Services App can get cluttered if all

Managing Dependencies and Runtime Security ActiveState Deminar Managing Dependencies and

Attribute Dependencies Wilhelm/Seidl/Hack: Compiler Design, Syntactic and Semantic Analysis

Nuking nasty memory leaks Pierre-Yves Ricau Pierre-Yves Ricau dependencies { }A dependencies

Functional Dependencies 1 Functional Dependencies X Y is an assertion about a relation

Bridging the Gap between Data Diversity and Data Dependencies Jean-Marc Petit INSA Lyon,

ConlluEditor: a fully graphical editor for Universal dependencies treebank files Johannes

Extending Dependencies with Conditions Loreto Bravo University of Edinburgh Wenfei Fan

Measuring and identifying human behaviors Technical dependencies important but Altmetrics

Generating Precise Dependencies for Large Software Pei Wang, Jinqiu Yang, Lin Tan University of

PCP Working Group Thursday 11 th November 2010 Stuart Cheshire, Apple Inc. 1 PCP Design

Light-Matter Correlations in Polariton Condensates 1) Alexey Kavokin University of Southampton,

Outline Diagnosis and assessment Contraception Lupus for the Internist: Cardiovascular

Thermodynamic analysis of irreversibilities in thin heat conducting films Federico Vzquez (1)

Mobile JavaScript Development or HTML5 apps Nikolai Onken - uxebu Consulting Ltd. &amp; Co. KG

Lecture Overview What is Artificial Intelligence? Agents acting in an environment Learning

Computer Graphics CPSC 453 Fall 2018 Sonny Chan Your Professor Dr. Sonny Chan -

@MilanGabor Dont be afraid to bug me! I dont bite! ;)

iTrails: Pay-as-you-go Information Integration in Introduction Data & Query Dataspaces

AngularJS Dependencies and Services Dependencies & Services App can get cluttered if all

Mobile JavaScript Development or HTML5 apps Nikolai Onken - uxebu Consulting Ltd. & Co. KG