near real time data warehousing with multi stage trickle
play

Near Real-time Data Warehousing with Multi-stage Trickle & Flip - PowerPoint PPT Presentation

This work has been supported by ESF project No. 2009/ 0216/ 1DP/ 1.1.1.2.0/ 09/ API A/ VI AA/ 044 Near Real-time Data Warehousing with Multi-stage Trickle & Flip J nis Zuters , University of Latvia, BIR 2011, October 8, 2011 Data


  1. This work has been supported by ESF project No. 2009/ 0216/ 1DP/ 1.1.1.2.0/ 09/ API A/ VI AA/ 044 Near Real-time Data Warehousing with Multi-stage Trickle & Flip J ā nis Zuters , University of Latvia, BIR 2011, October 8, 2011

  2. Data Flow In a Data Warehouse Near Real-time Data Data Source Warehouse OLAP ETL Data Loading CDC

  3. Near Real-time Refreshment Data Data Source Warehouse ETL Microbatch ETL OLAP Frequent data loading (e.g., hourly) Data loading conflicts

  4. The ‘Trickle & Flip’ Methodology Data Staging Source Tables ETL OLAP 1h copy flip Copy of Data Staging Warehouse Tables

  5. ‘Trickle & Flip’: The Algorithm ALGORITHM trickle_and_flip_refresh ( R ) DW – data warehouse D1 , D2 – staging partitions with the same data format as DW R – refreshment rate (e.g., 1 hour) D1 is being fed from the source BEGIN Do Every R % e.g., every hour Copy D1 to D2 % DW should not be locked by querying Flip D2 and DW

  6. ‘Trickle & Flip’: Demonstration Data Source 1h ETL flip Staging Copy of Data Tables Staging Warehouse Tables • 3 complete copies of data!

  7. ‘Trickle & Flip’ With Real-time Partition Data Staging Source OLAP Tables Static Data ETL 1h copy flip • 3 complete Real-time Partition copies of Copy of real-time Data Warehouse Staging data! Tables

  8. Why ‘Trickle & Flip’ is Better Than “Simple” Near Real-time? • Data warehouse suffers from data loading to a significantly less extent (flipping is very fast) flip Copy of Real-time Staging Partition Tables

  9. What Are The Issues of the “Pure” ‘Trickle & Flip’ Approach? • Frequent • If refreshment cycle times copying of become very frequent: large amount querying over real-time data of data becomes unconvenient OLAP copy flip Real-time Partition

  10. Assumptions to Add More Stages to ‘Trickle & Flip’ • Adding data to a smaller table (i.e., with less data) is faster; • Updating last changes to a table is faster than making full copy of the last version

  11. Multi-stage ‘Trickle & Flip’ Data Staging Staging Source Tables 2 Tables 1 add? add? OLAP add add 5 min. 1 hour ETL Static Staging Real-time Data Real-time Tables 0 Partition 1 Partition 2 Data Warehouse

  12. Multi-stage ‘Trickle & Flip’: The Algorithm (For a Single Stage) ALGORITHM multiple_trickle_and_flip_refresh ( R1, M, H`, H ) H – real-time partition for the current hour M , H` – staging partitions R1 –refreshment rate (e.g., 5 minutes) M is being continuously fed from the source BEGIN Do Every R1 % e.g., every 5 minutes Add M to H` Empty M If H is available % not locked by querying Add H` to H Empty H`

  13. Multi-stage ‘Trickle & Flip’: Demonstration Data Staging Staging Source Tables 1 Tables 2 OLAP ETL Static Staging Data Tables 0 Real-time Real-time Partition 1 Partition 2

  14. The Summary of Multi-stage ‘Trickle & Flip’ • Total amount of data copying is reduced; • Collisions between data loading and querying activities have been reduced; • More advanced querying system is required to fully benefit from the approach

  15. Thank You! Data Data Data Staging Staging Staging Staging Staging Staging Source Source Source Tables 2 Tables 2 Tables 2 Tables 1 Tables 1 Tables 1 add? add? add? add? add? add? OLAP OLAP OLAP add add add add add add 5 min. 5 min. 5 min. 1 hour 1 hour 1 hour ETL ETL ETL Static Static Staging Staging Real-time Real-time Real-time Data Data Real-time Real-time Real-time Tables 0 Tables 0 Partition 1 Partition 1 Partition 1 Partition 2 Partition 2 Partition 2 Data Warehouse Data Warehouse J ā nis Zuters , University of Latvia

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend