Continuous Database Evolution Prof. Dr. Uta Strl Darmstadt - - PowerPoint PPT Presentation

continuous database evolution
SMART_READER_LITE
LIVE PREVIEW

Continuous Database Evolution Prof. Dr. Uta Strl Darmstadt - - PowerPoint PPT Presentation

Continuous Database Evolution Prof. Dr. Uta Strl Darmstadt University of Applied Sciences Application version n Application version n + Schema Management Schema Management Schema Evolution Data Migration February 2019 Motivation


slide-1
SLIDE 1

Continuous Database Evolution

  • Prof. Dr. Uta Störl

Darmstadt University of Applied Sciences

February 2019

Application version n + … Application version n Schema Management Schema Management Schema Evolution Data Migration

slide-2
SLIDE 2

Motivation

  • Agile software development with frequent schema changes (weekly up

to daily!)  Schema-flexible NoSQL databases 

  • However, how to migrate variational data in the productive database?

– State of the art: Within the application code  Optional schema management for NoSQL database systems necessary!

Uta Störl, Darmstadt University of Applied Sciences

Application version n + … Application version n

2

slide-3
SLIDE 3

Remark: NoSQL Database Are Schema-Free – Aren’t They?

Uta Störl, Darmstadt University of Applied Sciences 3

NoSQL DBMS without native schema support

Couchbase, CouchDB, Neo4J, …

NoSQL DBMS with optional schema support

MongoDB, OrientDB, ArangoDB, …

NoSQL DBMS with mandatory schema

Cassandra, …

slide-4
SLIDE 4

Schema Management for NoSQL Databases

  • Forward Engineering

– Schema Creation

Uta Störl, Darmstadt University of Applied Sciences 4

Application Version n + 1 Application Version n Schema Version n Schema Version n+1

Forward Engineering

  • Reverse Engineering

– Schema Overview – Data Exploration

slide-5
SLIDE 5

Forward Engineering: Schema Creation

Uta Störl, Darmstadt University of Applied Sciences 5

Create

  • JSON Schema
  • Proprietary schema formats (e.g. Mongoose for

MongoDB, Ottoman for Couchbase, …)

slide-6
SLIDE 6

Schema Management for NoSQL Databases

  • Forward Engineering

– Schema Creation

Uta Störl, Darmstadt University of Applied Sciences 6

Application Version n + 1 Application Version n Schema Version n Schema Version n+1

Forward Engineering

  • Reverse Engineering

– Schema Overview – Data Exploration

slide-7
SLIDE 7

Reverse Engineering: Schema Overview

Uta Störl, Darmstadt University of Applied Sciences 7

slide-8
SLIDE 8

Reverse Engineering: Data Exploration

Uta Störl, Darmstadt University of Applied Sciences 8

slide-9
SLIDE 9

Tools for NoSQL Schema Management (Selection)

Multi Data Store Tools

  • Hackolade

– Support for MongoDB, Couchbase, Elasticsearch, HBase, Cassandra & Datastax , DynamoDB, Cosmos DB, Avro, Hive, and Hbase – Forward- and Reverse-Engineering (available in Professional edition only) – https://hackolade.com/

  • erwin DM NoSQL

– Support for MongoDB and Couchbase – Forward- and Reverse-Engineering (available in Professional edition only) – https://erwin.com/products/erwin-dm-nosql/

Uta Störl, Darmstadt University of Applied Sciences 9

slide-10
SLIDE 10

Tools for NoSQL Schema Management (Selection)

Single Data Store Tools (MongoDB)

  • MongoDB Compass

– (Reverse-Engineering) available free of charge – https://www.mongodb.com/products/compass

  • Studio 3T

– (Reverse-Engineering) (available in Pro edition only) – https://studio3t.com/

Uta Störl, Darmstadt University of Applied Sciences 10

slide-11
SLIDE 11

Tools for NoSQL Schema Management (Selection)

Research Prototypes (Multi Data Store Tools)

  • NoSQL Data Engineering Project NoSQL DEP

– Support for MongoDB and CouchDB – Reverse Engineering – University of Murcia, Spain: https://www.researchgate.net/project/NoSQL-Data-Engineering – Source: https://github.com/catedrasaes-umu/NoSQLDataEngineering/

  • Darwin: Schema Management for NoSQL Database Systems

– Support for MongoDB and Couchbase – Reverse Engineering – Darmstadt University of Applied Sciences, University of Rostock, OTH Regensburg, Germany: https://fbi.h-da.de/personen/uta-stoerl/dfg-projekt-nosql-schema-evolution/ – https://www.researchgate.net/project/Darwin-Schema-Management-in-NoSQL- Databases

Uta Störl, Darmstadt University of Applied Sciences 11

slide-12
SLIDE 12

NoSQL Schema Management: So Far, So Good

  • We are able to

– Define schemas and validate data (Forward Engineering) – Extract a schema overview and explore data (Reverse Engineering)

  • However, what about the heterogeneous data (due to different

application releases, for example) in the NoSQL database?

Uta Störl, Darmstadt University of Applied Sciences 12

???

slide-13
SLIDE 13

Continuous Database Evolution

  • (Optional) schema management for NoSQL databases

Two main tasks

– Schema evolution management – Data migration

Uta Störl, Darmstadt University of Applied Sciences

Application version n + … Application version n Schema Management Schema Management Schema Evolution Data Migration

13

slide-14
SLIDE 14

Approaches to Realize Data Migration

  • Custom-coded Migration Scripts

– Error-prone and expensive 

  • Using Object-NoSQL-Mapper Annotations

– Easy to realize for simple evolution operations like add, delete, and rename (e.g. @AlsoLoad) – More expensive for complex operations like split, merge, copy, and move (coding @PostLoad methods)

Uta Störl, Darmstadt University of Applied Sciences 14

slide-15
SLIDE 15

Approaches to Realize Data Migration

  • Forward Engineering

– Schema Creation

Uta Störl, Darmstadt University of Applied Sciences 15

Application Version n + 1 Application Version n Schema Version n Schema Version n+1

Forward Engineering

Evolution Operations

  • Reverse Engineering

– Schema Overview – Data Exploration  Advanced Reverse Engineering – Schema Version Extraction

slide-16
SLIDE 16

Example from Marine Biology

  • JSON datasets for Species classification of the Baltic Sea and observation Protocols

Uta Störl, Darmstadt University of Applied Sciences

{"id": 124, "name": "Abra prismatica", "ts": 3, "category": 141436} {"id": 901, "time": "2017-07-21", "location": {"x":19.863285, "y":58.487952, "z":-1400}, "spec_id": 123, "ts": 4} {"id": 123, "name": "Mya arenaria", "ts": 1} {"id": 900, "time": "2017-07-21", "location": {"x":19.863281, "y":58.487952, "z":-1400}, "spec_id": 123, "ts": 2},

entity type Species entity type Protocols

{"id": 125, "name": "Abra alba", "ts": 5, "WoRMS": 141433} {"id": 126, "name": "Abra aequalis", "ts": 7, "WoRMS": 293683} {"id": 902, "time": "2017-07-23", "location": {"x":19.863281, "y":58.487961, "z":-1350}, "spec_id": 125, "ts": 6} {"id": 903, "time": "2017-07-24", "location": {"x":19.863285, "y":58.487952, "z":-1400}, "spec_id": 126, "ts": 8, "WoRMS": 293683}

WoRMS: World Register of Marine Species

16

slide-17
SLIDE 17

Short Excursion: Schema Version Extraction Step 1 - Building the Schema Version Graphs

Uta Störl, Darmstadt University of Applied Sciences id [2,4,6,8]

type: number

y [2,4,6,8]

type: number

x [2,4,6,8]

type: number

WoRMS [8]

type: number

location [2,4,6,8]

type: object

spec_id [2,4,6,8]

type: number

z [2,4,6,8]

type: number

ts [2,4,6,8]

type: number

Protocols [2,4,6,8] time [2,4,6,8]

type: number

name [1,3,5,7]

type: string

ts [1,3,5,7]

type: number

category [3]

type: number

Species [1,3,5,7] id [1,3,5,7]

type: number

WoRMS [5,7]

type: number

17

slide-18
SLIDE 18

Short Excursion: Schema Version Extraction Step 2 - Deriving Schema Evolution Operations

Uta Störl, Darmstadt University of Applied Sciences id [2,4,6,8]

type: number

y [2,4,6,8]

type: number

x [2,4,6,8]

type: number

WoRMS [8]

type: number

location [2,4,6,8]

type: object

spec_id [2,4,6,8]

type: number

z [2,4,6,8]

type: number

ts [2,4,6,8]

type: number

Protocols [2,4,6,8] time [2,4,6,8]

type: number

name [1,3,5,7]

type: string

ts [1,3,5,7]

type: number

category [3]

type: number

Species [1,3,5,7] id [1,3,5,7]

type: number

WoRMS [5,7]

type: number

2 3 4 3 4

add integer Species.category rename Species.category to WoRMS

  • r

delete Species.category add Species.WoRMS add integer Protocols.WoRMS

  • r

copy Species.WoRMS to Protocols.WoRMS where Species.id = Protocols.spec_id

2 3 4

18

slide-19
SLIDE 19

Short Excursion: Schema Version Extraction Step 3 - Resolving Ambiguities

  • Interactively resolving ambiguous schema evolution operations:

– alternative schema evolution operations – specifying join conditions for move or copy operations

  • Open questions

– Automated choice in case of ambiguities – Suggestion of meaningful join conditions

  • Approach to a solution

– Algorithm for deriving inclusion dependencies from NoSQL datasets proposed in [Klettke et al. 2017]

Uta Störl, Darmstadt University of Applied Sciences 19

slide-20
SLIDE 20

Advanced Reverse Engineering: Schema Version Extraction

Uta Störl, Darmstadt University of Applied Sciences 20

slide-21
SLIDE 21

Advanced Reverse Engineering: Schema Version Extraction

Uta Störl, Darmstadt University of Applied Sciences 21

slide-22
SLIDE 22

Continuous Database Evolution

  • (Optional) schema management for NoSQL databases

Two main tasks

– Schema evolution management – Data migration as safe process based on schema evolution management

Uta Störl, Darmstadt University of Applied Sciences

Application version n + … Application version n Schema Management Schema Management Schema Evolution Data Migration

22

slide-23
SLIDE 23

Basic Strategies of Data Migration

Uta Störl, Darmstadt University of Applied Sciences

Eager Migration

  • after introduction of a new schema version, all

entities are migrated

Advantages:

+

all entities are in the current version

+

low latency (when entities are accessed) Disadvantages:

even entities that are not in use are migrated

high number (and costs) of migration operations

Schema Evolution Data Migration Data Migration

Lazy Migration

  • evolution operations are stored,
  • data migration is done on request

Advantages:

+

  • nly entities that are in use are migrated

+

no unnecessary data migration operations

+

composition of operations is possible Disadvantages:

entities in the NoSQL database in different versions

increased latency

schema Sv schema Sv+1 Schema Evolution schema Sv schema Sv+1 Data Migration

23

slide-24
SLIDE 24

How to Reduce Costs of Data Migration?

  • In case of large amount of datasets

– update operations even for cold data (that is not in use)

  • In case of database as a service

– payable operations, monetary costs for all data migrations

  • How to reduce costs of data migration?

Optimize Data Migration Hybrid / Proactive Migration Approaches

  • Predictive Migration
  • Incremental Migration

Uta Störl, Darmstadt University of Applied Sciences 24

slide-25
SLIDE 25

Proactive Migration Strategies

Uta Störl, Darmstadt University of Applied Sciences

Data Migration

Incremental Migration

  • in some version, an eager migration is applied

Advantage:

+

composition of operations is possible Disadvantage:

even entities that are not in use are migrated

Schema Evolution schema Sv schema Sv+1 Data Migration

25

Predictive Migration

  • Forecast function, which entities are accessed

in near future (based on heuristics)

  • Predictive migration of these entities

Advantages:

+

decreased average latency

+

reduced number of migration operations Disadvantage:

additional migration operations in case of wrong predictions

Schema Evolution Data Migration schema Sv schema Sv+1

slide-26
SLIDE 26

Tradeoffs in Choosing a Data Migration Strategy

Uta Störl, Darmstadt University of Applied Sciences 26

slide-27
SLIDE 27

MigCast: Choosing a Suitable Data Migration Strategy

Uta Störl, Darmstadt University of Applied Sciences 27

This work is supported by DFG 385808805

slide-28
SLIDE 28

Continuous Database Evolution

  • Forward Engineering

– Schema Creation

  • Reverse Engineering

– Schema Overview – Data Exploration

  • Advanced Reverse Engineering

– Schema Version Extraction

  • Data Migration

– Suitable Data Migration Strategies

Uta Störl, Darmstadt University of Applied Sciences 28

slide-29
SLIDE 29

Continuous Database Evolution – Tools (Selection)

  • Forward Engineering

– Schema Creation

  • Reverse Engineering

– Schema Overview – Data Exploration

  • Advanced Reverse Eng.

– Schema Version Extraction

  • Data Migration

– Suitable Data Migration Strategies

Uta Störl, Darmstadt University of Applied Sciences 29

Multi Data Store Tools Single Data Store Tools Research Prototypes NoSQL DEP NoSQL DEP

slide-30
SLIDE 30

Further Reading

  • M. Fowler: Schemaless Data Structures, 2013., https://martinfowler.com/articles/schemaless/
  • P. Sadalage, M. Fowler: Evolutionary Database Design, 2016,

https://martinfowler.com/articles/evodb.html

  • U. Störl, D. Müller, M. Klettke, S. Scherzinger: Enabling Efficient Agile Software Development of

NoSQL-backed Applications, BTW, 2017, https://dl.gi.de/bitstream/handle/20.500.12116/667/paper49.pdf

  • M. Klettke, H. Awolin, U. Störl, D. Müller, and S. Scherzinger. Uncovering the Evolution History of Data
  • Lakes. SCDM 2017, https://ieeexplore.ieee.org/document/8258204
  • A. H. Chillón, S. F. Morales, D. Sevilla, J. G. Molina: Exploring the Visualization of Schemas for

Aggregate-Oriented NoSQL Databases. ER 2017, http://ceur-ws.org/Vol-1979/paper-11.pdf

  • D. Sevilla: Discovery and Visualization of NoSQL Database Schemas, 2018,

https://modeling-languages.com/discovery-and-visualization-of-nosql-database-schemas/

Uta Störl, Darmstadt University of Applied Sciences 30

slide-31
SLIDE 31

Feedback

Uta Störl, Darmstadt University of Applied Sciences 31

uta.stoerl@h-da.de