Continuous Database Evolution
- Prof. Dr. Uta Störl
Darmstadt University of Applied Sciences
February 2019
Application version n + … Application version n Schema Management Schema Management Schema Evolution Data Migration
Continuous Database Evolution Prof. Dr. Uta Strl Darmstadt - - PowerPoint PPT Presentation
Continuous Database Evolution Prof. Dr. Uta Strl Darmstadt University of Applied Sciences Application version n Application version n + Schema Management Schema Management Schema Evolution Data Migration February 2019 Motivation
Darmstadt University of Applied Sciences
February 2019
Application version n + … Application version n Schema Management Schema Management Schema Evolution Data Migration
to daily!) Schema-flexible NoSQL databases
– State of the art: Within the application code Optional schema management for NoSQL database systems necessary!
Uta Störl, Darmstadt University of Applied Sciences
Application version n + … Application version n
2
Uta Störl, Darmstadt University of Applied Sciences 3
NoSQL DBMS without native schema support
Couchbase, CouchDB, Neo4J, …
NoSQL DBMS with optional schema support
MongoDB, OrientDB, ArangoDB, …
NoSQL DBMS with mandatory schema
Cassandra, …
– Schema Creation
Uta Störl, Darmstadt University of Applied Sciences 4
Application Version n + 1 Application Version n Schema Version n Schema Version n+1
Forward Engineering
– Schema Overview – Data Exploration
Uta Störl, Darmstadt University of Applied Sciences 5
Create
MongoDB, Ottoman for Couchbase, …)
– Schema Creation
Uta Störl, Darmstadt University of Applied Sciences 6
Application Version n + 1 Application Version n Schema Version n Schema Version n+1
Forward Engineering
– Schema Overview – Data Exploration
Uta Störl, Darmstadt University of Applied Sciences 7
Uta Störl, Darmstadt University of Applied Sciences 8
Multi Data Store Tools
– Support for MongoDB, Couchbase, Elasticsearch, HBase, Cassandra & Datastax , DynamoDB, Cosmos DB, Avro, Hive, and Hbase – Forward- and Reverse-Engineering (available in Professional edition only) – https://hackolade.com/
– Support for MongoDB and Couchbase – Forward- and Reverse-Engineering (available in Professional edition only) – https://erwin.com/products/erwin-dm-nosql/
Uta Störl, Darmstadt University of Applied Sciences 9
Single Data Store Tools (MongoDB)
– (Reverse-Engineering) available free of charge – https://www.mongodb.com/products/compass
– (Reverse-Engineering) (available in Pro edition only) – https://studio3t.com/
Uta Störl, Darmstadt University of Applied Sciences 10
Research Prototypes (Multi Data Store Tools)
– Support for MongoDB and CouchDB – Reverse Engineering – University of Murcia, Spain: https://www.researchgate.net/project/NoSQL-Data-Engineering – Source: https://github.com/catedrasaes-umu/NoSQLDataEngineering/
– Support for MongoDB and Couchbase – Reverse Engineering – Darmstadt University of Applied Sciences, University of Rostock, OTH Regensburg, Germany: https://fbi.h-da.de/personen/uta-stoerl/dfg-projekt-nosql-schema-evolution/ – https://www.researchgate.net/project/Darwin-Schema-Management-in-NoSQL- Databases
Uta Störl, Darmstadt University of Applied Sciences 11
– Define schemas and validate data (Forward Engineering) – Extract a schema overview and explore data (Reverse Engineering)
application releases, for example) in the NoSQL database?
Uta Störl, Darmstadt University of Applied Sciences 12
Two main tasks
– Schema evolution management – Data migration
Uta Störl, Darmstadt University of Applied Sciences
Application version n + … Application version n Schema Management Schema Management Schema Evolution Data Migration
13
– Error-prone and expensive
– Easy to realize for simple evolution operations like add, delete, and rename (e.g. @AlsoLoad) – More expensive for complex operations like split, merge, copy, and move (coding @PostLoad methods)
Uta Störl, Darmstadt University of Applied Sciences 14
– Schema Creation
Uta Störl, Darmstadt University of Applied Sciences 15
Application Version n + 1 Application Version n Schema Version n Schema Version n+1
Forward Engineering
Evolution Operations
– Schema Overview – Data Exploration Advanced Reverse Engineering – Schema Version Extraction
Uta Störl, Darmstadt University of Applied Sciences
{"id": 124, "name": "Abra prismatica", "ts": 3, "category": 141436} {"id": 901, "time": "2017-07-21", "location": {"x":19.863285, "y":58.487952, "z":-1400}, "spec_id": 123, "ts": 4} {"id": 123, "name": "Mya arenaria", "ts": 1} {"id": 900, "time": "2017-07-21", "location": {"x":19.863281, "y":58.487952, "z":-1400}, "spec_id": 123, "ts": 2},
entity type Species entity type Protocols
{"id": 125, "name": "Abra alba", "ts": 5, "WoRMS": 141433} {"id": 126, "name": "Abra aequalis", "ts": 7, "WoRMS": 293683} {"id": 902, "time": "2017-07-23", "location": {"x":19.863281, "y":58.487961, "z":-1350}, "spec_id": 125, "ts": 6} {"id": 903, "time": "2017-07-24", "location": {"x":19.863285, "y":58.487952, "z":-1400}, "spec_id": 126, "ts": 8, "WoRMS": 293683}
WoRMS: World Register of Marine Species
16
Uta Störl, Darmstadt University of Applied Sciences id [2,4,6,8]
type: number
y [2,4,6,8]
type: number
x [2,4,6,8]
type: number
WoRMS [8]
type: number
location [2,4,6,8]
type: object
spec_id [2,4,6,8]
type: number
z [2,4,6,8]
type: number
ts [2,4,6,8]
type: number
Protocols [2,4,6,8] time [2,4,6,8]
type: number
name [1,3,5,7]
type: string
ts [1,3,5,7]
type: number
category [3]
type: number
Species [1,3,5,7] id [1,3,5,7]
type: number
WoRMS [5,7]
type: number
17
Uta Störl, Darmstadt University of Applied Sciences id [2,4,6,8]
type: number
y [2,4,6,8]
type: number
x [2,4,6,8]
type: number
WoRMS [8]
type: number
location [2,4,6,8]
type: object
spec_id [2,4,6,8]
type: number
z [2,4,6,8]
type: number
ts [2,4,6,8]
type: number
Protocols [2,4,6,8] time [2,4,6,8]
type: number
name [1,3,5,7]
type: string
ts [1,3,5,7]
type: number
category [3]
type: number
Species [1,3,5,7] id [1,3,5,7]
type: number
WoRMS [5,7]
type: number
2 3 4 3 4
add integer Species.category rename Species.category to WoRMS
delete Species.category add Species.WoRMS add integer Protocols.WoRMS
copy Species.WoRMS to Protocols.WoRMS where Species.id = Protocols.spec_id
2 3 4
18
– alternative schema evolution operations – specifying join conditions for move or copy operations
– Automated choice in case of ambiguities – Suggestion of meaningful join conditions
– Algorithm for deriving inclusion dependencies from NoSQL datasets proposed in [Klettke et al. 2017]
Uta Störl, Darmstadt University of Applied Sciences 19
Uta Störl, Darmstadt University of Applied Sciences 20
Uta Störl, Darmstadt University of Applied Sciences 21
Two main tasks
– Schema evolution management – Data migration as safe process based on schema evolution management
Uta Störl, Darmstadt University of Applied Sciences
Application version n + … Application version n Schema Management Schema Management Schema Evolution Data Migration
22
Uta Störl, Darmstadt University of Applied Sciences
Eager Migration
entities are migrated
Advantages:
+
all entities are in the current version
+
low latency (when entities are accessed) Disadvantages:
even entities that are not in use are migrated
high number (and costs) of migration operations
Schema Evolution Data Migration Data Migration
Lazy Migration
Advantages:
+
+
no unnecessary data migration operations
+
composition of operations is possible Disadvantages:
entities in the NoSQL database in different versions
increased latency
schema Sv schema Sv+1 Schema Evolution schema Sv schema Sv+1 Data Migration
23
– update operations even for cold data (that is not in use)
– payable operations, monetary costs for all data migrations
Optimize Data Migration Hybrid / Proactive Migration Approaches
Uta Störl, Darmstadt University of Applied Sciences 24
Uta Störl, Darmstadt University of Applied Sciences
Data Migration
Incremental Migration
Advantage:
+
composition of operations is possible Disadvantage:
even entities that are not in use are migrated
Schema Evolution schema Sv schema Sv+1 Data Migration
25
Predictive Migration
in near future (based on heuristics)
Advantages:
+
decreased average latency
+
reduced number of migration operations Disadvantage:
additional migration operations in case of wrong predictions
Schema Evolution Data Migration schema Sv schema Sv+1
Uta Störl, Darmstadt University of Applied Sciences 26
Uta Störl, Darmstadt University of Applied Sciences 27
This work is supported by DFG 385808805
– Schema Creation
– Schema Overview – Data Exploration
– Schema Version Extraction
– Suitable Data Migration Strategies
Uta Störl, Darmstadt University of Applied Sciences 28
– Schema Creation
– Schema Overview – Data Exploration
– Schema Version Extraction
– Suitable Data Migration Strategies
Uta Störl, Darmstadt University of Applied Sciences 29
Multi Data Store Tools Single Data Store Tools Research Prototypes NoSQL DEP NoSQL DEP
https://martinfowler.com/articles/evodb.html
NoSQL-backed Applications, BTW, 2017, https://dl.gi.de/bitstream/handle/20.500.12116/667/paper49.pdf
Aggregate-Oriented NoSQL Databases. ER 2017, http://ceur-ws.org/Vol-1979/paper-11.pdf
https://modeling-languages.com/discovery-and-visualization-of-nosql-database-schemas/
Uta Störl, Darmstadt University of Applied Sciences 30
Uta Störl, Darmstadt University of Applied Sciences 31
uta.stoerl@h-da.de