Bridging the Gap between Data Diversity and Data Dependencies - PowerPoint PPT Presentation

Bridging the Gap between Data Diversity and Data Dependencies Bridging the Gap between Data Diversity and Data Dependencies Jean-Marc Petit INSA Lyon, Universit´ e de Lyon LIRIS CNRS (UMR 5205) 24th International Symposium on Methodologies for Intelligent Systems (ISMIS 2018) Limassol, Cyprus 1

Bridging the Gap between Data Diversity and Data Dependencies Introduction Data diversity 2

Bridging the Gap between Data Diversity and Data Dependencies Introduction Data Diversity: not only a gender question ! 3

Bridging the Gap between Data Diversity and Data Dependencies Introduction Example from the astrophysics domain The Sloan Digital Sky Survey (SDSS): Mapping the Universe ! u g r i z Class erru errg errr erri errz STAR 16.56 14.62 13.94 13.79 13.48 0.01 0.00 0.01 0.01 0.00 Galaxie 19.79 17.77 16.59 16.07 15.63 0.06 0.01 0.00 0.00 0.01 STAR 15.64 14.04 14.57 12.83 13.12 0.01 0.00 0.01 0.00 0.01 Galaxie 21.61 20.81 19.87 19.30 19.03 0.15 0.04 0.02 0.02 0.05 STAR 20.09 17.28 15.79 14.31 13.49 0.04 0.00 0.00 0.00 0.00 5 magnitudes (u, g, r, i, and z) catalog database ⇒ Require to deal with numerical interval data as first class citizen See http://www.sdss.org/dr12/ for details 4

Bridging the Gap between Data Diversity and Data Dependencies Introduction Data and metadata from SDSS 5

Bridging the Gap between Data Diversity and Data Dependencies Introduction Data diversity To cope with data diversity, key notions have be studied for years in computer science: data and metadata representation, data uncertainty, data inconsistency, data heterogeneity . . . Dealing with data diversity remains the hardest thing in practise ⇒ Require to understand what’s hidden behind the data : Where do they come from ? How are they produced ? ⇒ Be as close as possible of the available data sources and experts to better match their intended meaning 6

Bridging the Gap between Data Diversity and Data Dependencies Introduction Data dependencies 7

Bridging the Gap between Data Diversity and Data Dependencies Introduction Classical example of data dependencies: functional dependencies r | = X → Y iff for all t 1 , t 2 ∈ r If for all A ∈ X , t 1 [ A ] = t 2 [ A ] then for all B ∈ Y , t 1 [ B ] = t 2 [ B ] Turns out to be a very general notion, related to implications. a b a → b Many connections with lattice 0 0 1 theory, formal concept analysis 0 1 1 (Galois connection) and logics 1 0 0 (see for ex [11]) 1 1 1 Crucial to understand relational database design 8

Bridging the Gap between Data Diversity and Data Dependencies Introduction Beyond database design New and timely applications require some forms of FD: Data quality: Analysing existing data to identify data quality problems [17, 9] Machine learning over relational databases: FD-aware optimization for in-database learning [19] Semantic query optimization: Query rewriting techniques based on data dependencies [12] ⇒ Many extensions of FD have been proposed to take into account some forms of data diversity (e.g. see [10, 18] for a survey) Matching Dependencies, Denial constraints . . . [17, 9, 15] Implications in Formal Concept Analysis (FCA) [7, 6] Association rules . . . in Data mining [5] 9

Bridging the Gap between Data Diversity and Data Dependencies Introduction Data diversity and data dependencies 10

Bridging the Gap between Data Diversity and Data Dependencies Introduction Questions and Contributions How to take into account data diversity for data dependencies ? Does there exist unifying frameworks ? Two contributions: RQL: a query language to express implications over relational databases (ISMIS 2005 [3], demo ICDM 2014 [13], TCS 2017 [14]) Structural properties on attribute domains (ongoing work) 11

Bridging the Gap between Data Diversity and Data Dependencies RQL query language Contents RQL query language 1 Preliminaries Main result underlying RQL The RQL language RQL implementation Summary Structural properties on attribute domains 2 Similarity map: a semilattice version Data Dependencies with similarity maps Main results Conclusion and perspective 3 12

Bridging the Gap between Data Diversity and Data Dependencies RQL query language Preliminaries Important known results for FD Let F be a set of FD over a schema R CL ( F ) = { X ⊆ R | X + F = X } : a closure system of F IRR ( F ) the set of irreducible elements of CL ( F ) by intersection Reasoning on F is equivalent to reasoning on CL ( F ), for instance: X + F = { A ∈ R | F | = X → A } = ∩{ Y ∈ CL ( F ) | X ⊆ Y } Let r be a relation over R . The agree set of r is ag ( r ) = { ag ( t 1 , t 2 ) | t 1 , t 2 ∈ r } where ag ( t 1 , t 2 ) = { A ∈ R | t 1 [ A ] = t 2 [ A ] } r is an Armstrong relation for F iff IRR ( F ) ⊆ ag ( r ) ⊆ CL ( F ) [8] 13

Bridging the Gap between Data Diversity and Data Dependencies RQL query language Preliminaries Example Bar ( B ) Beer ( Be ) Price ( P ) Nota bene Adelscott 2 t 1 Montagne 1664 1.5 t 2 Nota bene 1664 2 t 3 Ritz Adelscott 5 t 4 Caf´ e Flore Affligen 6 t 5 F = { B → P , P → B } CL ( F ) = {∅ , Be , BP , BBeP } IRR ( F ) = { Be , BP } B Be P 0 0 0 ag ( r ) = {∅ , Be , BP } , often represented as: 0 1 0 1 0 1 14

Bridging the Gap between Data Diversity and Data Dependencies RQL query language Preliminaries Towards a rule query language Focus on rules equivalent to implications (or FD) ⇒ Armstrong axioms (reflexivity, augmentation, transitivity) have to be sound and complete Idea : Defining a rule query language (RQL) such that every RQL statement turns out to deliver implications Require to identify syntactic constraints such that we remain within the reasoning of implications 15

Bridging the Gap between Data Diversity and Data Dependencies RQL query language Preliminaries Semantics of implications Let b 0 be a binary relation (given by a { 0 , 1 } -relation) b 0 | = X → Y ⇔ ∀ t ∈ b 0 ( ∀ A ∈ X t . A = 1) ⇒ ( ∀ A ∈ Y t . A = 1) Let d = { r 0 , r 1 , ..., r n } be a relational database r 0 | = X → Y ⇔ ∀ t 1 , t 2 ∈ r 0 ( ∀ A ∈ X t 1 . A = t 2 . A ) ⇒ ( ∀ A ∈ Y t 1 . A = t 2 . A ) d | = X → Y ⇔ ∀ t 1 , t 2 ∈ π X ( σ F ( r i 0 ⊲ ⊳ . . . ⊲ ⊳ r i p )) ( ∀ A ∈ X t 1 . A = t 2 . A ) ⇒ ( ∀ A ∈ Y t 1 . A = t 2 . A ) d | = X → Y ⇔ ∀ t 1 ∈ π X ( σ F ( r i 0 ⊲ ⊳ . . . ⊲ ⊳ r i n )) , ∀ t 2 ∈ π X ( σ F ′ ( r j 0 ⊲ ⊳ . . . ⊲ ⊳ r i n )) such that ( t 1 . rank = t 2 . rank + 1) ( ∀ A ∈ X t 1 . A = t 2 . A ) ⇒ ( ∀ A ∈ Y t 1 . A = t 2 . A ) 16

Bridging the Gap between Data Diversity and Data Dependencies - PowerPoint PPT Presentation

Bridging the Gap between Data Diversity and Data Dependencies Bridging the Gap between Data Diversity and Data Dependencies Jean-Marc Petit INSA Lyon, Universit e de Lyon LIRIS CNRS (UMR 5205) 24th International Symposium on Methodologies

Bridging the Gender Pay Gap By: Christine Acquah Gender Pay Gap, What is it? The gap between

Bridging the Gap on Breaches: What Makes the Difference? Sponsored By: Bridging the Gap on

DRIVING DIVERSITY AND INCLUSION INDUSTRY COLLABORATION TO CLOSE THE DIVERSITY GAP IN COMMERCIAL

Bridging The Gap Between Information Security & IT Audit Agenda Introductions

Gender Pay Gap Reporting What is Gender Pay Gap? Gender Pay Gap is the difference between the

1 CONTENTS 1. Supplier Diversity Data Call 2. Insurer Response Rate 3. Supplier Diversity

High Performing Governance: Bridging the Gap between Political Acceptability and Administrative

Conservation Education, Communication and Outreach Success Stories: Bridging the Gap Between

Bridging the Gap: An overview of CPRITs Early Translational Research Award (ETRA) and SEED

Part II. Fading and Diversity Impact of Fading in Detection; Time Diversity; Antenna Diversity;

Part II. Fading and Diversity Impact of Fading in Detection; Time Diversity; Antenna Diversity;

Fundamentals of Diversity Reception What is diversity? Diversity is a technique to combine

UCF FINANCIALS THE N EXT G EN Fit-Gap Kick Off April 17, 2018 AGENDA How are fit-gap sessions

MCP gap bottom bottom electrode gap Anode

Research on Race Bridging for 2020 Ben Bolender Assistant Division Chief Population Estimates

BRIDGING TECHNOLOGICAL GAP BETWEEN SMALLER AND LARGER LANGUAGES Andrejs Vasijevs Tilde Pisa

Query Optimizer MySQL vs. PostgreSQL Percona Live, Santa Clara (USA), 24 April 2018 Christian

The Relational Data Model Lecture 6 1 Outline Relational Data Model Functional

Monad transformers Advanced functional programming - Lecture 5 Trevor L. McDonell (& Wouter

Normalization Redundancy causes several anomalies : insert, delete and update

A.I.S. Class 10: Outline I REA Ontology I Learning Objectives for Chapter 9 I Designing the Data

CSCI2952-F Microservices.. Day 3: Migrations Summary Administrivia (HW#1, Paper signups)

Beneficial Design Designing Beyond the Norm to Meet the Needs of All People Peter W. Axelson,

+ f(x) = Python Functional Programming Python Functional Programming Functional Programming by

Bridging the Gap between Data Diversity and Data Dependencies - PowerPoint PPT Presentation

Bridging the Gap between Data Diversity and Data Dependencies Bridging the Gap between Data Diversity and Data Dependencies Jean-Marc Petit INSA Lyon, Universit e de Lyon LIRIS CNRS (UMR 5205) 24th International Symposium on Methodologies

Bridging the Gender Pay Gap By: Christine Acquah Gender Pay Gap, What is it? The gap between

Bridging the Gap on Breaches: What Makes the Difference? Sponsored By: Bridging the Gap on

DRIVING DIVERSITY AND INCLUSION INDUSTRY COLLABORATION TO CLOSE THE DIVERSITY GAP IN COMMERCIAL

Bridging The Gap Between Information Security &amp; IT Audit Agenda Introductions

Gender Pay Gap Reporting What is Gender Pay Gap? Gender Pay Gap is the difference between the

1 CONTENTS 1. Supplier Diversity Data Call 2. Insurer Response Rate 3. Supplier Diversity

High Performing Governance: Bridging the Gap between Political Acceptability and Administrative

Conservation Education, Communication and Outreach Success Stories: Bridging the Gap Between

Bridging the Gap: An overview of CPRITs Early Translational Research Award (ETRA) and SEED

Part II. Fading and Diversity Impact of Fading in Detection; Time Diversity; Antenna Diversity;

Part II. Fading and Diversity Impact of Fading in Detection; Time Diversity; Antenna Diversity;

Fundamentals of Diversity Reception What is diversity? Diversity is a technique to combine

UCF FINANCIALS THE N EXT G EN Fit-Gap Kick Off April 17, 2018 AGENDA How are fit-gap sessions

MCP gap bottom bottom electrode gap Anode

Research on Race Bridging for 2020 Ben Bolender Assistant Division Chief Population Estimates

BRIDGING TECHNOLOGICAL GAP BETWEEN SMALLER AND LARGER LANGUAGES Andrejs Vasijevs Tilde Pisa

Query Optimizer MySQL vs. PostgreSQL Percona Live, Santa Clara (USA), 24 April 2018 Christian

The Relational Data Model Lecture 6 1 Outline Relational Data Model Functional

Monad transformers Advanced functional programming - Lecture 5 Trevor L. McDonell (&amp; Wouter

Normalization Redundancy causes several anomalies : insert, delete and update

A.I.S. Class 10: Outline I REA Ontology I Learning Objectives for Chapter 9 I Designing the Data

CSCI2952-F Microservices.. Day 3: Migrations Summary Administrivia (HW#1, Paper signups)

Beneficial Design Designing Beyond the Norm to Meet the Needs of All People Peter W. Axelson,

+ f(x) = Python Functional Programming Python Functional Programming Functional Programming by

Bridging The Gap Between Information Security & IT Audit Agenda Introductions

Monad transformers Advanced functional programming - Lecture 5 Trevor L. McDonell (& Wouter