Adaptive Schema Databases William Spoth b , Bahareh Sadat Arab i , - PowerPoint PPT Presentation

Adaptive Schema Databases William Spoth b , Bahareh Sadat Arab i , Eric S. Chan o , Dieter Gawlick o , Adel Ghoneimy o , Boris Glavic i , Beda Hammerschmidt o , Oliver Kennedy b , Seokki Lee i , Zhen Hua Liu o , Xing Niu i , Ying Yang b b: University at Buffalo i: Illinois Inst. Tech. o: Oracle 1

Adaptive Schema Databases 2

Classic relational database • Navigational and organizational purpose retain discovery, good performance and space, reusable. 3

Classic relational database • But... High upfront cost and inflexible 4

BigData/NOSQL • Data can be used immediately. 5

BigData/NOSQL • But... Sacrifice navigational and Performance benefit and may end up with duplicate of work 6

Adaptive Schema Databases • Bridge the gap between relational database and NoSQl. Queries and feedback... eventually 7

Adaptive Schema Databases • Bridge the gap between relational database and NoSQl. Queries and feedback... eventually 8

Adaptive Schema Databases Input: Queries: SELECT name FROM Undergrad UNION SELECT name FROM Grad SELECT deg FROM Grad SELECT name FROM Student … 9

Outline • Extraction and discovery Queries + Feedback • Adaptive, personalized schemas Schema Schema Schema Schema from queries Workspace Workspace Workspace Workspace Schema Matching • Explanations and feedback • Adaptive organization Extraction Schema Candidates • Conclusions and future work Extraction workflow Extraction workflow Extraction workflow Unstructured Data Semi-structed Data (e.g., JSON) 10

Extraction Queries + Feedback Schema Schema Schema Schema Workspace Workspace Workspace Workspace Schema Matching Extraction Schema Candidates Extraction workflow Extraction workflow Extraction workflow Unstructured Data Semi-structed Data (e.g., JSON) 11

Extraction ASD extracts schema candidate set • Given input: 12

Discovery ASD extracts schema candidate set • schema candidate set C ext= {S ext , P ext }, where S ext is a set of candidate schemas, P ext is a probability distribution over these schemas. 16

Discovery ASD extracts schema candidate set • Smax: the best guess schema 17

Adaptive, personalized schemas from queries Queries + Feedback Schema Schema Schema Schema Workspace Workspace Workspace Workspace Schema Matching Extraction Schema Candidates Extraction workflow Extraction workflow Extraction workflow Unstructured Data Semi-structed Data (e.g., JSON) 18

Adaptive, personalized schemas ASD maintains a set of schema workspaces W={W 1,..., W n }. • Initially, W={} 19

Finding Schemas from Queries ASD maintains a set of schema workspaces W={W 1,..., W n }. • Query 1: SELECT name FROM Undergrad UNION SELECT name FROM Grad 20

Finding Schemas from Queries ASD maintains a set of schema workspaces W={W 1,..., W n }. • Query 1: SELECT name FROM Undergrad UNION SELECT name FROM Grad 21

Finding Schemas from Queries ASD maintains a set of schema workspaces W={W 1,..., W n }. • Query 2: SELECT deg FROM Grad 22

Synthesizing Tables ASD maintains a set of schema workspaces W={W 1,..., W n }. • Query 3: SELECT name FROM Student W 1 = (S 1 ={Undergrad(name)},P 1 =0.27), (S 1 ={Grad(name)},P 1 =0.23), (S 1 ={Undergrad(name), Grad(name)}, P 1 =0.5) 23

Explanations and feedback Queries + Feedback Schema Schema Schema Schema Workspace Workspace Workspace Workspace Schema Matching Extraction Schema Candidates Extraction workflow Extraction workflow Extraction workflow Unstructured Data Semi-structed Data (e.g., JSON) 24

What might go wrong Extraction errors appear in three forms: (1) A query incompatible with S max (2) An update with data that violates S max (3) An extraction error presented to user We provide: (1) explanation of results (2) provenance (3) Warn the analyst with ambiguity (4) Explain the ambiguity (5) Evaluate the magnitude of ambiguity (6) Assist the analyst to resolve the ambiguity 25

Types of errors ASD interacts with the outside world: Schema, Data, and Update. Schema interactions: When a query incompatible with S max and the workspace Data interactions: provenance for attribute and row level ambiguity. Update interactions: represent schema mismatches as missing values. • resolve data errors with a probabilistic repair. • upgrade her schema to match the changes. • checkpoint her workspace and ignore new updates. • 26

Explanations and feedback Condition 2: Query from unknown schema elements: SELECT name FROM Student W 1 = (S 1 ={Undergrad(name)},P 1 =0.27), Explanations: (S 1 ={Grad(name)},P 1 =0.23), We match Student with (S 1 ={Undergrad(name), Grad(name)}, P 1 =0.5) both Grad and Undergrad 27

Adaptive organization Queries + Feedback Schema Schema Schema Schema Workspace Workspace Workspace Workspace Schema Matching Extraction Schema Candidates Extraction workflow Extraction workflow Extraction workflow Unstructured Data Semi-structed Data (e.g., JSON) 28

Adaptive organization Trade-off between storing data in its native format and based on a specific schema. What is the challenge? Many workspaces, add table to the schema, …. Challenges and Possible Solutions: • We want multiple personalized schemas 1. Relational workspace schema is essentially a view over raw data. Materializing view can be used. 2. Use existing adaptive physical design and caching techniques. • Shared materializations 1. Incremental materialized view maintenance. Leverage techniques from revision control systems. 2. View selection problem. 29

Conclusions and future work ASD bridges the gap between relational databases and NoSQL. Discovery : Help user explore and understand new data by providing an • outline of the available information. Done Materialization : Adopt work on adaptive data structures. Partially done • Data Synthesis : Synthesis new tables and attributes from existing data. • Done Conflict Response : • – Versioning or branching the schema. – Log analysis to help users assess the impact of schema revisions. 30

Adaptive Schema Databases William Spoth b , Bahareh Sadat Arab i , - PowerPoint PPT Presentation

Adaptive Schema Databases William Spoth b , Bahareh Sadat Arab i , Eric S. Chan o , Dieter Gawlick o , Adel Ghoneimy o , Boris Glavic i , Beda Hammerschmidt o , Oliver Kennedy b , Seokki Lee i , Zhen Hua Liu o , Xing Niu i , Ying Yang b b:

Linked Open Data data.slub-dresden.de Linked Open Usable Data data.slub-dresden.de schema.org

Schema Languages Schema Languages Regular expressions a commonly used formalism in schema

Schema Matching in a Large Scale Schema Matching in a Large Scale Personal Schema Based Querying

Schema validation and evolution for PGs Eugenia Oshurko (ENS Lyon) 7 March 2019 Main ideas

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Creating Databases and Tables Introduction to Databases in Python Creating Databases

Inductive Inductive Inductive Inductive Databases Databases Databases Databases and

Lecture 11: Persistent Memory Databases 1 / 71 Persistent Memory Databases Recap

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

IP-XACT XML Schema Vanderlei Bonato Sep 2008 Outline XML Schema The seven top-level

REFEDS Schema Editorial Board https://wiki.refeds.org/display/STAN/Schema+Editorial+Board

The LDAP Directory Schema AGENDA Why do we need a good schema? From the White Pages to

Module 3: Creating and Managing Databases Overview Creating Databases Creating

Adaptive Control Chapter 13: Multimodel adaptive control with switching Chapter 13: Multimodel

Training/Preparation for the NSS/SFF/ASD March Storm 2016 Home District Blitz Dale Skran Chair,

DEVELOPING AN ADAPTIVE TREATMENT STRATEGY FOR PEER-RELATED SOCIAL SKILLS FOR CHILDREN WITH AUTISM

Girls ls Ni Nigh ght Ou Out: : Less ssons lear arned from a a uniq ique interv rventio

A Matter of Style: The Causes and Consequences of Style Drift in Mutual Fund Portfolios Russ

The Dedekind Reals in ASD Paul Taylor 1 Andrej Bauer 2 1 Department of Computer Science University

Outline London CBF and Brussels Photonic Exchange Time and Frequency Transfer in

PDA IT chapter updates Aseptic filling: technological innovations and regulatory trends

Peering Beyond the Five Moments of Hand Hygiene Compliance Colin Furness MISt PhD MPH MEd (cand)