Squall: Fine-Grained Live Reconfiguration for Partitioned Main - - PowerPoint PPT Presentation

squall fine grained live reconfiguration for partitioned
SMART_READER_LITE
LIVE PREVIEW

Squall: Fine-Grained Live Reconfiguration for Partitioned Main - - PowerPoint PPT Presentation

Squall: Fine-Grained Live Reconfiguration for Partitioned Main Memory Databases AARON J. ELMORE, VAIBHAV ARORA, REBECCA TAFT, ANDY PAVLO, DIVY AGRAWAL, AMR EL ABBADI Higher OLTP Throughput Demand for High-throughput transactional systems


slide-1
SLIDE 1

Squall: Fine-Grained Live Reconfiguration for Partitioned Main Memory Databases

AARON J. ELMORE, VAIBHAV ARORA, REBECCA TAFT, ANDY PAVLO, DIVY AGRAWAL, AMR EL ABBADI

slide-2
SLIDE 2

Higher OLTP Throughput

Demand for High-throughput transactional systems (OLTP) especially due to web-based services

  • Cost per GB for RAM is dropping.
  • Network memory is faster than local disk.

Let’s use Main-Memory

slide-3
SLIDE 3

Scaling-out via Partitioning

Growth in scale of the data Data Partitioning enables managing scale via Scaling-Out.

slide-4
SLIDE 4

Approaches for main-memory DBMS*

Highly concurrent, latch-free data structures – Hekaton, Silo Partitioned data with single-threaded executors – Hstore, VoltDB *Excuse the generalization

slide-5
SLIDE 5

Client Application

Procedure Name Input Parameters

Slide Credits: Andy Pavlo

slide-6
SLIDE 6

The Problem: Workload Skew

High skew increases latency by 10X and decreases throughput by 4X Partitioned shared-nothing systems are especially susceptible

slide-7
SLIDE 7

The Problem: Workload Skew

Possible solutions: Provision resources for peak load (Very expensive and brittle!) Demand Capacity Time Resources

Unused Resources

slide-8
SLIDE 8

The Problem: Workload Skew

Possible solutions: Limit load on system (Poor performance!) Time Resources

slide-9
SLIDE 9

Need Elasticity

slide-10
SLIDE 10

The Promise of Elasticity

Demand Capacity Time Resources Unused resources

Slide Credits: Berkeley RAD Lab

slide-11
SLIDE 11

What we need…

Enable system to elastically scale in or out to dynamically adapt to changes in load

Reconfiguration

Change the partition plan Add nodes Remove nodes

slide-12
SLIDE 12

Problem Statement

Need to migrate tuples between partitions to reflect the updated partition plan. Would like to do this without bringing the system offline:

  • Live Reconfiguration

Partition Warehouse Partition 1 [0,2) Partition 2 [2,4) Partition 3 [4,6) Partition Warehouse Partition 1 [0,1) Partition 2 [2,3) Partition 3 [1, 2),[3,6)

slide-13
SLIDE 13

E-Store

Normal

  • peration,

high level monitoring Tuple level monitoring (E-Monitor) Tuple placement planning (E-Planner) Online reconfiguration (Squall) Load imbalance detected Hot tuples, partition-level access counts Reconfiguration complete New partition plan

slide-14
SLIDE 14

Live Migrations Solutions are Not Suitable

Predicated on disk based solutions with traditional concurrency and recovery. Zephyr: Relies on concurrency (2PL) and disk pages. ProRea: Relies on concurrency (SI and OCC) and disk pages. Albatross: Relies on replication and shared disk storage. Also introduces strain on source.

slide-15
SLIDE 15

Not Your Parents’ Migration

Single threaded execution model

  • Either doing work or migration

More than a single source and destination (and the destination is not cold)

  • Want lightweight coordination

Presence of distributed transactions and replication

slide-16
SLIDE 16

Squall

Given plan from E-Planner, Squall physically moves the data while the system is live Pull based mechanism – Destination pulls from source Conforms to H-Store single-threaded execution model

  • While data is moving, transactions are blocked – but only on partitions moving the data

To avoid performance degradation, Squall moves small chunks of data at a time, interleaved with regular transaction execution

slide-17
SLIDE 17

Reconfiguration

(New Plan, Leader ID)

Pull W_ID=2

Partition 2 Partition 3

Squall Steps

Pull W_ID>5

Client

Partition 1 Partition 4 Partition 2 Partition 3 Partition 1 Partition 4

1 2 3 4 5 6 7 8 9 10

Partitioned by Warehouse id

1 2 3 4 5 6 7 8 9 10

Incoming: 2 Outgoing: 5 Outgoing: 2 Incoming: 5

  • 1. Initialization and Identify migrating data
  • 2. Live reactive pulls for required data
  • 3. Periodic lazy/async pulls for large chunks
slide-18
SLIDE 18

Chunk Data for Asynchronous Pulls

slide-19
SLIDE 19

Why Chunk?

Unknown amount of data when not partitioned by clustered index. Customers by W_ID in TPC-C Time spent extracting, is time not spent on TXNS.

slide-20
SLIDE 20

Async Pulls

Periodically pull chunks of cold data These pulls are answered lazily – Start at lower priority than transactions. Priority increases with time. Execution is interwoven with extracting and sending data (dirty the range!)

slide-21
SLIDE 21

Chunking Async Pulls

Async Pull Request Destination Data Source Data

slide-22
SLIDE 22

Keys to Performance

Properly size reconfiguration granules and space them apart. Split large reconfigurations to limit demands on a single partition. Redirect or pull only if needed. Tune what gets pulled. Sometimes pull a little extra.

slide-23
SLIDE 23

Optimization: Splitting Reconfigurations

1. Split by pairs of source and destination - Avoids contention to a single partition

  • Example: partition 1 is migrating W_ID 2,3 to partitions 3 and 7, execute as two reconfigurations.

2. Split large objects and migrate one piece at a time

slide-24
SLIDE 24

Evaluation

Workloads YCSB TPC-C Baselines Stop & Copy Purely Reactive – Only Demand based pulling Zephyr+ - Purely Reactive + Asynchronous Chunking with Pull Prefetching (Semantically equivalent to Zephyr)

slide-25
SLIDE 25

YCSB Latency

YCSB cluster consolidation 4 to 3 nodes YCSB data shuffle 10% pairwise

slide-26
SLIDE 26

Results Highlight

TPC-C load balancing hotspot warehouses

slide-27
SLIDE 27

All about trade-offs

Trading off time to complete migration and performance degradation. Future work to consider automating this trade-off based on service level objectives.

slide-28
SLIDE 28

I Fell Asleep… What Happened?

Partitioned Single Threaded Main Memory Environment -> Susceptible to Hotspots. Elastic data Management is a solution -> Squall provides a mechanism for executing a fine grained live reconfiguration

Questions?

slide-29
SLIDE 29

Tuning Optimizations

slide-30
SLIDE 30

Sizing Chunks

Static analysis to set chunk sizes, future work to dynamically set sizing and scheduling. Impact of chunk sizes on a 10% reconfiguration during a YCSB workload.

slide-31
SLIDE 31

Spacing Async Pulls

Delay at destination between new async pull requests. Impact on chunk sizes on a 10% reconfiguration during a YCSB workload with 8mb chunk size.

slide-32
SLIDE 32

Effect of Splitting into Sub-Plans

Set a cap on sub-plan splits, and split on pairs and ability to decompose migrating objects