MaSM: Efficient Online Updates in Data Warehouses
Manos Athanassoulis1 Shimin Chen2 Anastasia Ailamaki1 Phillip Gibbons2 Radu Stoica1
1EPFL 2Intel Labs
MaSM: Efficient Online Updates in Data Warehouses Manos - - PowerPoint PPT Presentation
MaSM: Efficient Online Updates in Data Warehouses Manos Athanassoulis 1 Shimin Chen 2 Anastasia Ailamaki 1 Phillip Gibbons 2 Radu Stoica 1 1 EPFL 2 Intel Labs Freshness vs Performance Data warehouse workload Read-only queries (scans)
1EPFL 2Intel Labs
2
0.5 1 1.5 2 2.5 Query only Query w/ updates Query only + Updates only Ideal
Normalized execution time
TPCH queries (on avg) Freshness
Performance
3
0.5 1 1.5 2 2.5 Query only Query w/ updates Query only + Updates only Ideal
Normalized execution time
TPCH queries (on avg)
Freshness
Performance
4
[Stonebraker et al .’05] [Heman et al.’10]
Ø Apply them online Ø Apply them as differential updates x Large memory overhead x Trade-off migration overhead for memory footprint
1 10 100 1000 16MB 128MB 1GB 8GB
normalized migration overhead
in-memory buffer size
cache updates in memory
ideal
Update Approach Freshness Performance ↓ mem overhead Batched X J J In place
J
X J In-memory differential J J X
5
[Stonebraker et al .’05] [Heman et al.’10]
Ø Apply them online Ø Apply them as differential updates x Large memory overhead x Trade-off migration overhead for memory footprint
1 10 100 1000 16MB 128MB 1GB 8GB
normalized migration overhead
in-memory buffer size
cache updates in memory
ideal
Update Approach Freshness Performance ↓ mem overhead Batched X J J In place
J
X J In-memory differential J J X
6
[O’ Neil et al.’96]
SSD
7
[O’ Neil et al.’96]
SSD
Key Value Type 5 V5’ Mod 19 V19’ Mod 1 V1’ Mod 9 N/A Del 125 V125 Ins 5 V5’’ Mod
Key Value 1 V1 2 V2 3 V3 4 V4 5 V5 6 V6 7 V7 8 V8 9 V9
8
K Value Type 1 V1’ Mod 5 V5’’ Mod 9 N/A Del 19 V19’ Mod 125 V125 Ins
Key Value Type 5 V5’ Mod 19 V19’ Mod 1 V1’ Mod 9 N/A Del 125 V125 Ins 5 V5’’ Mod
Key Value 1 V1 2 V2 3 V3 4 V4 5 V5 6 V6 7 V7 8 V8 9 V9
9
K Value Type 1 V1’ Mod 5 V5’’ Mod 9 N/A Del 19 V19’ Mod 125 V125 Ins
10
11
Main memory Disks (main data) e.g. TBs
SSD e.g. GBs
Main memory
12
Disks (main data) e.g. TBs
Incoming query Merge data & updates
Table Range Scan Run Scan Run Scan Run Scan
Merge updates
Mem Scan
SSD e.g. GBs
Main memory
13
Disks (main data) e.g. TBs
Incoming query Merge data & updates
Table Range Scan Run Scan Run Scan Run Scan
Merge updates
Mem Scan
SSD e.g. GBs
14
Main memory Disks (main data) e.g. TBs
Incoming query Merge data & updates
Table Range Scan Run Scan Run Scan Run Scan
Merge updates
Mem Scan
1-pass runs 2-pass runs
SSD e.g. GBs
15
Main memory Disks (main data) e.g. TBs
Incoming query Merge data & updates
Table Range Scan Run Scan Run Scan Run Scan
Merge updates
Mem Scan
1-pass runs 2-pass runs
SSD e.g. GBs
16
17
18
19
20
100GB main data, 4GB flash for cached updates, 16MB memory
1 2 3 4
4KB 100KB 1MB 10MB 100MB 1GB 10GB 100GB normalized time
range size
in-place updates MaSM w/ coarse-grain index MaSM w/ fine-grain index
21
500 1000 1500 2000 q1 q2 q3 q4 q5 q6 q7 q8 q9 q10 q11 q12 q13 q14 q15 q16 q18 q19 q21 q22 execution time (s) query w/o updates query w/ in-place updates query w/ MaSM updates
3537s
22
500 1000 1500 2000 q1 q2 q3 q4 q5 q6 q7 q8 q9 q10 q11 q12 q13 q14 q15 q16 q18 q19 q21 q22 execution time (s) query w/o updates query w/ in-place updates query w/ MaSM updates
3537s
23
2000 4000 6000 8000 10000 12000 14000 in-place updates MaSM 2GB SSD MaSM 4GB SSD MaSM 8GB SSD Update Rate (upd/s)
24
2000 4000 6000 8000 10000 12000 14000 in-place updates MaSM 2GB SSD MaSM 4GB SSD MaSM 8GB SSD Update Rate (upd/s)
– Limited number of writes per updates – No random writes on SSD
25
0.5 1 1.5 2 2.5 Query only Query w/ updates Query only + Updates only Ideal
Normalized execution time
TPCH queries (on avg)
MaSM
– Limited number of writes per updates – No random writes on SSD
26
0.5 1 1.5 2 2.5 Query only Query w/ updates Query only + Updates only Ideal
Normalized execution time
TPCH queries (on avg)
MaSM
Update Approach Freshness Performance ↓ mem overhead Batched X J J In place J X J In-memory differential J J X MaSM and SSD J J J