continuous database evolution
play

Continuous Database Evolution Prof. Dr. Uta Strl Darmstadt - PowerPoint PPT Presentation

Continuous Database Evolution Prof. Dr. Uta Strl Darmstadt University of Applied Sciences Application version n Application version n + Schema Management Schema Management Schema Evolution Data Migration February 2019 Motivation


  1. Continuous Database Evolution Prof. Dr. Uta Störl Darmstadt University of Applied Sciences Application version n Application version n + … Schema Management Schema Management Schema Evolution Data Migration February 2019

  2. Motivation • Agile software development with frequent schema changes (weekly up to daily!)  Schema-flexible NoSQL databases  Application version n Application version n + … • However, how to migrate variational data in the productive database? – State of the art: Within the application code   Optional schema management for NoSQL database systems necessary! Uta Störl, Darmstadt University of Applied Sciences 2

  3. Remark: NoSQL Database Are Schema-Free – A ren’t They? NoSQL DBMS NoSQL DBMS NoSQL DBMS without native with optional with mandatory schema support schema support schema Couchbase, CouchDB, Neo4J, … MongoDB, OrientDB, ArangoDB , … Cassandra, … Uta Störl, Darmstadt University of Applied Sciences 3

  4. Schema Management for NoSQL Databases Application Application Version n + 1 Version n • Forward Engineering Forward Engineering – Schema Creation Schema Schema Version n Version n+1 • Reverse Engineering – Schema Overview – Data Exploration Uta Störl, Darmstadt University of Applied Sciences 4

  5. Forward Engineering: Schema Creation Create • JSON Schema • Proprietary schema formats (e.g. Mongoose for MongoDB, Ottoman for Couchbase, …) Uta Störl, Darmstadt University of Applied Sciences 5

  6. Schema Management for NoSQL Databases Application Application Version n + 1 Version n • Forward Engineering Forward Engineering – Schema Creation Schema Schema Version n Version n+1 • Reverse Engineering – Schema Overview – Data Exploration Uta Störl, Darmstadt University of Applied Sciences 6

  7. Reverse Engineering: Schema Overview Uta Störl, Darmstadt University of Applied Sciences 7

  8. Reverse Engineering: Data Exploration Uta Störl, Darmstadt University of Applied Sciences 8

  9. Tools for NoSQL Schema Management (Selection) Multi Data Store Tools • Hackolade – Support for MongoDB, Couchbase, Elasticsearch, HBase, Cassandra & Datastax , DynamoDB, Cosmos DB, Avro, Hive, and Hbase – Forward- and Reverse-Engineering (available in Professional edition only) – https://hackolade.com/ • erwin DM NoSQL – Support for MongoDB and Couchbase – Forward- and Reverse-Engineering (available in Professional edition only) – https://erwin.com/products/erwin-dm-nosql/ Uta Störl, Darmstadt University of Applied Sciences 9

  10. Tools for NoSQL Schema Management (Selection) Single Data Store Tools (MongoDB) • MongoDB Compass – (Reverse-Engineering) available free of charge – https://www.mongodb.com/products/compass • Studio 3T – (Reverse-Engineering) (available in Pro edition only) – https://studio3t.com/ Uta Störl, Darmstadt University of Applied Sciences 10

  11. Tools for NoSQL Schema Management (Selection) Research Prototypes (Multi Data Store Tools) • NoSQL Data Engineering Project NoSQL DEP – Support for MongoDB and CouchDB – Reverse Engineering – University of Murcia, Spain: https://www.researchgate.net/project/NoSQL-Data-Engineering – Source: https://github.com/catedrasaes-umu/NoSQLDataEngineering/ • Darwin: Schema Management for NoSQL Database Systems – Support for MongoDB and Couchbase – Reverse Engineering – Darmstadt University of Applied Sciences, University of Rostock, OTH Regensburg, Germany: https://fbi.h-da.de/personen/uta-stoerl/dfg-projekt-nosql-schema-evolution/ – https://www.researchgate.net/project/Darwin-Schema-Management-in-NoSQL- Databases Uta Störl, Darmstadt University of Applied Sciences 11

  12. NoSQL Schema Management: So Far, So Good • We are able to – Define schemas and validate data (Forward Engineering) – Extract a schema overview and explore data (Reverse Engineering) • However, what about the heterogeneous data (due to different application releases, for example) in the NoSQL database? ??? Uta Störl, Darmstadt University of Applied Sciences 12

  13. Continuous Database Evolution • (Optional) schema management for NoSQL databases Application version n Application version n + … Schema Management Schema Management Schema Evolution Data Migration  Two main tasks – Schema evolution management – Data migration Uta Störl, Darmstadt University of Applied Sciences 13

  14. Approaches to Realize Data Migration • Custom-coded Migration Scripts – Error-prone and expensive  • Using Object-NoSQL-Mapper Annotations – Easy to realize for simple evolution operations like add, delete, and rename (e.g. @AlsoLoad) – More expensive for complex operations like split, merge, copy, and move (coding @PostLoad methods) Uta Störl, Darmstadt University of Applied Sciences 14

  15. Approaches to Realize Data Migration Application Application Version n + 1 Version n • Forward Engineering Forward Engineering – Schema Creation Schema Schema Evolution Operations Version n Version n+1 • Reverse Engineering – Schema Overview – Data Exploration  Advanced Reverse Engineering – Schema Version Extraction Uta Störl, Darmstadt University of Applied Sciences 15

  16. Example from Marine Biology • JSON datasets for Species classification of the Baltic Sea and observation Protocols entity type Species {"id": 123, {"id": 124, {"id": 125, {"id": 126, "name": "Mya arenaria", "name": "Abra prismatica", "name": "Abra alba", "name": "Abra aequalis", "ts": 1} "ts": 3, "ts": 5, "ts": 7, " category ": 141436} " WoRMS ": 141433} "WoRMS": 293683} entity type Protocols {"id": 900, {"id": 901, {"id": 902, {"id": 903, "time": "2017-07-21", "time": "2017-07-21", "time": "2017-07-23", "time": "2017-07-24", "location": "location": "location": "location": {"x":19.863281, {"x":19.863285, {"x":19.863281, {"x":19.863285, "y":58.487952, "y":58.487952, "y":58.487961, "y":58.487952, "z":-1400}, "z":-1400}, "z":-1350}, "z":-1400}, "spec_id": 123, "spec_id": 123, "spec_id": 125, "spec_id": 126, "ts": 2}, "ts": 4} "ts": 6} "ts": 8, " WoRMS ": 293683} WoRMS: World Register of Marine Species Uta Störl, Darmstadt University of Applied Sciences 16

  17. Short Excursion: Schema Version Extraction Step 1 - Building the Schema Version Graphs Species [1,3,5,7] id name ts category WoRMS [1,3,5,7] [1,3,5,7] [1,3,5,7] [3] [5,7] type: number type: string type: number type: number type: number Protocols [2,4,6,8] WoRMS id ts time location spec_id [8] [2,4,6,8] [2,4,6,8] [2,4,6,8] [2,4,6,8] [2,4,6,8] type: number type: number type: number type: number type: object type: number x y z [2,4,6,8] [2,4,6,8] [2,4,6,8] type: number type: number type: number Uta Störl, Darmstadt University of Applied Sciences 17

  18. Short Excursion: Schema Version Extraction Step 2 - Deriving Schema Evolution Operations Species 2 [1,3,5,7] add integer Species.category id name ts category WoRMS 3 [1,3,5,7] [1,3,5,7] [1,3,5,7] [3] [5,7] rename Species.category to WoRMS type: number type: string type: number type: number type: number 2 3 3 4 or delete Species.category Protocols [2,4,6,8] add Species.WoRMS 4 WoRMS id ts time location spec_id [8] [2,4,6,8] [2,4,6,8] add integer Protocols.WoRMS [2,4,6,8] [2,4,6,8] [2,4,6,8] type: number type: number type: number type: number type: object type: number or 4 x y z copy Species.WoRMS to Protocols.WoRMS [2,4,6,8] [2,4,6,8] [2,4,6,8] type: number type: number type: number where Species.id = Protocols.spec_id Uta Störl, Darmstadt University of Applied Sciences 18

  19. Short Excursion: Schema Version Extraction Step 3 - Resolving Ambiguities • Interactively resolving ambiguous schema evolution operations: – alternative schema evolution operations – specifying join conditions for move or copy operations • Open questions – Automated choice in case of ambiguities – Suggestion of meaningful join conditions • Approach to a solution – Algorithm for deriving inclusion dependencies from NoSQL datasets proposed in [Klettke et al. 2017] Uta Störl, Darmstadt University of Applied Sciences 19

  20. Advanced Reverse Engineering: Schema Version Extraction Uta Störl, Darmstadt University of Applied Sciences 20

  21. Advanced Reverse Engineering: Schema Version Extraction Uta Störl, Darmstadt University of Applied Sciences 21

  22. Continuous Database Evolution • (Optional) schema management for NoSQL databases Application version n Application version n + … Schema Management Schema Management Schema Evolution Data Migration  Two main tasks – Schema evolution management – Data migration as safe process based on schema evolution management Uta Störl, Darmstadt University of Applied Sciences 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend