Near Real-time Data Warehousing with Multi-stage Trickle & Flip - - PowerPoint PPT Presentation

near real time data warehousing with multi stage trickle
SMART_READER_LITE
LIVE PREVIEW

Near Real-time Data Warehousing with Multi-stage Trickle & Flip - - PowerPoint PPT Presentation

This work has been supported by ESF project No. 2009/ 0216/ 1DP/ 1.1.1.2.0/ 09/ API A/ VI AA/ 044 Near Real-time Data Warehousing with Multi-stage Trickle & Flip J nis Zuters , University of Latvia, BIR 2011, October 8, 2011 Data


slide-1
SLIDE 1

Near Real-time Data Warehousing with Multi-stage Trickle & Flip

Jānis Zuters, University of Latvia, BIR 2011, October 8, 2011

This work has been supported by ESF project No. 2009/ 0216/ 1DP/ 1.1.1.2.0/ 09/ API A/ VI AA/ 044

slide-2
SLIDE 2

Data Flow In a Data Warehouse

Data Source Data Warehouse ETL OLAP CDC Data Loading Near Real-time

slide-3
SLIDE 3

Near Real-time Refreshment

Data Source Data Warehouse ETL Microbatch ETL Data loading conflicts OLAP Frequent data loading (e.g., hourly)

slide-4
SLIDE 4

The ‘Trickle & Flip’ Methodology

Data Source ETL Data Warehouse OLAP Staging Tables Copy of Staging Tables flip copy 1h

slide-5
SLIDE 5

‘Trickle & Flip’: The Algorithm

ALGORITHM trickle_and_flip_refresh (R) DW – data warehouse D1, D2 – staging partitions with the same data format as DW R – refreshment rate (e.g., 1 hour) D1 is being fed from the source BEGIN Do Every R % e.g., every hour Copy D1 to D2 % DW should not be locked by querying Flip D2 and DW

slide-6
SLIDE 6

‘Trickle & Flip’: Demonstration

  • 3 complete copies of data!

Data Source ETL Staging Tables Copy of Staging Tables Data Warehouse 1h flip

slide-7
SLIDE 7

‘Trickle & Flip’ With Real-time Partition

Data Source ETL Data Warehouse OLAP Staging Tables Copy of Staging Tables flip copy 1h Real-time Partition Static Data

  • 3 complete

copies of real-time data!

slide-8
SLIDE 8

Why ‘Trickle & Flip’ is Better Than “Simple” Near Real-time?

  • Data warehouse suffers from data loading

to a significantly less extent (flipping is very fast)

Copy of Staging Tables flip Real-time Partition

slide-9
SLIDE 9

What Are The Issues of the “Pure” ‘Trickle & Flip’ Approach?

  • Frequent

copying of large amount

  • f data
  • If refreshment cycle times

become very frequent: querying over real-time data becomes unconvenient

copy OLAP flip Real-time Partition

slide-10
SLIDE 10

Assumptions to Add More Stages to ‘Trickle & Flip’

  • Adding data to a smaller table (i.e., with

less data) is faster;

  • Updating last changes to a table is faster

than making full copy of the last version

slide-11
SLIDE 11

Multi-stage ‘Trickle & Flip’

Data Source OLAP Data Warehouse Staging Tables 0

ETL

Real-time Partition 1 add? Staging Tables 1 add 5 min. Real-time Partition 2 add? Staging Tables 2 add 1 hour Static Data

slide-12
SLIDE 12

Multi-stage ‘Trickle & Flip’: The Algorithm (For a Single Stage)

ALGORITHM multiple_trickle_and_flip_refresh (R1, M, H`, H) H – real-time partition for the current hour M, H` – staging partitions R1 –refreshment rate (e.g., 5 minutes) M is being continuously fed from the source BEGIN Do Every R1 % e.g., every 5 minutes Add M to H` Empty M If H is available % not locked by querying Add H` to H Empty H`

slide-13
SLIDE 13

Multi-stage ‘Trickle & Flip’: Demonstration

Data Source

ETL

Staging Tables 0 Real-time Partition 1 Real-time Partition 2 Staging Tables 1 Staging Tables 2 Static Data OLAP

slide-14
SLIDE 14

The Summary of Multi-stage ‘Trickle & Flip’

  • Total amount of data copying is reduced;
  • Collisions between data loading and

querying activities have been reduced;

  • More advanced querying system is

required to fully benefit from the approach

slide-15
SLIDE 15

Thank You!

Data Source Data Warehouse OLAP Staging Tables 0

ETL

Real-time Partition 1 add? Staging Tables 1 add 5 min. Real-time Partition 2 add? Staging Tables 2 add 1 hour Static Data Data Source Data Source Data Warehouse OLAP OLAP Staging Tables 0

ETL ETL

Real-time Partition 1 add? Real-time Partition 1 add? Staging Tables 1 add 5 min. Staging Tables 1 add 5 min. Real-time Partition 2 add? Real-time Partition 2 add? Staging Tables 2 add 1 hour Staging Tables 2 add 1 hour Static Data

Jānis Zuters, University of Latvia