St Staring g into the Abyss: ss: An Evaluation of Co Concurrency - - PowerPoint PPT Presentation

st staring g into the abyss ss an evaluation of co
SMART_READER_LITE
LIVE PREVIEW

St Staring g into the Abyss: ss: An Evaluation of Co Concurrency - - PowerPoint PPT Presentation

St Staring g into the Abyss: ss: An Evaluation of Co Concurrency Co y Control wi with O One T Thousand Co Cores Xiangyao Yu 1 George Bezerra 1 Andrew Pavlo 2 Srinivas Devadas 1 Michael Stonebraker 1 1 CSAIL, 2 Dept. of Computer Science


slide-1
SLIDE 1

St Staring g into the Abyss: ss: An Evaluation of Co Concurrency Co y Control wi with O One T Thousand Co Cores

Xiangyao Yu1 George Bezerra1 Andrew Pavlo2 Srinivas Devadas1 Michael Stonebraker1

1 CSAIL,

Massachusetts Institute of Technology

2 Dept. of Computer Science

Carnegie Mellon University

Published in VLDB 2014 Presenter : Vaibhav Jain

1

slide-2
SLIDE 2

Motivation(1)

ØThe era of single-core CPU speed-up is over. ØNumber of cores on a chip is increasing exponentially

§ Increase computation power by thread level parallelism § 1000-core chips are near…

Xeon Phi (up to 61 cores) Tilera (up to 100 cores)

2

slide-3
SLIDE 3

Motivation(2)

ØIs the DBMS ready to be scaled ?

§ Most DBMSs still focus on single-threaded performance § Existing works on multi-cores focus on small core count

3

slide-4
SLIDE 4

Objective

  • To evaluate transaction processing at 1000 cores.
  • Focus on one scalability challenge : Concurrency control.
  • Discuss the bottlenecks and improvements needed.

4

slide-5
SLIDE 5

Implementation

  • Concurrency Control Schemes
  • DBMS TestBed

5

slide-6
SLIDE 6

Concurrency Control Schemes

CC Scheme Description

DL_DETECT 2PL with deadlock detection NO_WAIT 2PL with non-waiting deadlock prevention WAIT_DIE 2PL with wait-and-die deadlock prevention TIMESTAMP Basic T/O algorithm MVCC Multi-version T/O OCC Optimistic concurrency control HSTORE T/O with partition-level locking

Two–Phase Locking (2PL) Timestamp Ordering (T/O) Partitioning

6

slide-7
SLIDE 7

Two-Phase Locking (1)

7

slide-8
SLIDE 8

Two-Phase Locking (2)

8

ØLock conflict

§ DL_DETECT: always wait. § NO_WAIT: always abort. § WAIT_DIE: wait if older, otherwise abort

ØExample systems

§ Ingres, Informix, IBM DB2, MS SQL Server, MySQL (InnoDB)

deadlock detection deadlock prevention

slide-9
SLIDE 9

Concurrency Control Schemes

CC Scheme Description

DL_DETECT 2PL with deadlock detection NO_WAIT 2PL with non-waiting deadlock prevention WAIT_DIE 2PL with wait-and-die deadlock prevention TIMESTAMP Basic T/O algorithm MVCC Multi-version T/O OCC Optimistic concurrency control HSTORE T/O with partition-level locking

Two–Phase Locking (2PL) Timestamp Ordering (T/O) Partitioning

9

slide-10
SLIDE 10

Timestamp Ordering (T/O) (1)

Each transaction has a unique timestamp indicating the serial order.

  • 1. TIMESTAMP (Basic Timestamp Ordering)
  • R/W request rejected if tx timestamp < timestamp of last write.
  • 2. MVCC (Multi-Version Concurrency Control)
  • Every write op creates a new timestamped version
  • For read op, DBMS decides which version it accesses.

10

slide-11
SLIDE 11

Timestamp Ordering (T/O) (2)

  • 3. OCC (Optimistic Concurrency Control)
  • Private workspace of each transaction.
  • At commit time, if any overlap, tx is aborted and restarted.
  • Advantage : short contention period.

Example systems Oracle, Postgres, MySQL (InnoDB), SAP HANA, MemSQL, MS Hekaton

11

slide-12
SLIDE 12

Concurrency Control Schemes

CC Scheme Description

DL_DETECT 2PL with deadlock detection NO_WAIT 2PL with non-waiting deadlock prevention WAIT_DIE 2PL with wait-and-die deadlock prevention TIMESTAMP Basic T/O algorithm MVCC Multi-version T/O OCC Optimistic concurrency control HSTORE T/O with partition-level locking

Two–Phase Locking (2PL) Timestamp Ordering (T/O) Partitioning

12

slide-13
SLIDE 13

H-Store

  • Database divided into disjoint memory subsets called partitions.
  • Each partition protected by locks.
  • Tx acquires locks to all partitions it needs to access.
  • DBMS assigns it a timestamp and adds it to lock queues.

13

slide-14
SLIDE 14

DBMS Test Bed (1)

Graphite : CPU simulator, scales upto 1024 cores.

  • Application threads mapped to simulated core threads.
  • Simulated threads mapped to multiple processes on host machines.

14

slide-15
SLIDE 15

DBMS Test Bed (2)

  • Implemented light-weight pthread based DBMS.
  • Allows to swap different concurrency schemes.
  • Ensures no other bottlenecks than concurrency control.
  • Reports transaction statistics.

15

slide-16
SLIDE 16

General Optimizations

  • 1. Memory Allocation:

Custom malloc , resizable memory pool for each thread.

  • 2. Lock Table:

Instead of centralized lock table, per-tuple locks

  • 3. Mutexes:

Avoid mutex on critical path.

  • For 2PL, centralized deadlock detector
  • For t/o : allocating unique timestamps.

16

slide-17
SLIDE 17

Scalable 2PL

  • 1. Deadlock Detection
  • Making deadlock detector lock free by keeping local wait-for graph.
  • Thread searches for cycles in partial wait-for graph.
  • 2. Lock Thrashing
  • Holding locks until commit => bottleneck in concurrent Txs.
  • Timeout threshold : abort Tx if wait time exceeds timeout.

17

slide-18
SLIDE 18

Scalable T/O

  • 1. Timestamp Allocation

a) Batched atomic addition

  • Manager returns multiple timestamps for a request.

b) CPU clocks

  • Read logical clock of core, concatenate with thread id.
  • requires synchronized clocks.

c) Hardware counters

  • Physically located at center of CPU.

18

slide-19
SLIDE 19

Ev Evaluation

Read-Only Workload

19

slide-20
SLIDE 20

Read Only Workload

20

Ø 2PL schemes are scalable for read only benchmarks

slide-21
SLIDE 21

Read Only Workload

21

Ø 2PL schemes are scalable for read only benchmarks Ø Timestamp allocation limits scalability

slide-22
SLIDE 22

Read Only Workload

22

Ø 2PL schemes are scalable for read only benchmarks Ø Timestamp allocation limits scalability Ø Memory copy hurts performance

slide-23
SLIDE 23

Write Intensive (medium contention)

23

No_Wait, Wait_Die scales better than others. DL_Detect inhibited by lock thrashing.

slide-24
SLIDE 24

Write Intensive (High contention)

24

Ø Scaling stops at small core count(64)

slide-25
SLIDE 25

Write Intensive (High contention)

25

Ø Scaling stops at small core count(64) Ø NO_WAIT has good performance but falls due to thrashing.

slide-26
SLIDE 26

Write Intensive (High contention)

26

Ø Scaling stops at small core count (64) Ø NO_WAIT has good performance but falls due to thrashing. Ø OCC wins at 1000 cores as one Tx always commits.

slide-27
SLIDE 27

More Analysis

  • 1. Short Transactions => Low Lock contention

Longer Transactions => Timestamp allocation not a bottleneck.

  • 2. More read transactions => Better throughput.
  • 3. Multi partition transactions => H-Store scheme performs bad.

Partitioned workloads => H-Store best algorithm

27

slide-28
SLIDE 28

Bottlenecks Summary

28

Concurrency Control Waiting (Thrashing) High Abort Rate Timestamp Allocation Multi- partition DL_DETECT NO_WAIT WAIT_DIE TIMESTAMP MULTIVERSION OCC HSTORE

slide-29
SLIDE 29

Summary

All algorithms fail to scale as core increases. ØThrashing limits the scalability of 2PL algorithms ØTimestamp allocation limits the scalability of T/O algorithms

29

slide-30
SLIDE 30

Project Ideas

  • New concurrency control approaches to tackle scalability problem.
  • Hardware solutions to DBMS bottlenecks unsolvable in software side.
  • Hybrid approach : Switch b/w schemes depending on workload.

30

slide-31
SLIDE 31

Questions

31

slide-32
SLIDE 32

Thrashing

32

v" u z" y" x" tuples transactions A" B" C" D" Locking Waiting