CS 839: Design the Next-Generation Database Lecture 4: Multicore - - PowerPoint PPT Presentation

cs 839 design the next generation database lecture 4
SMART_READER_LITE
LIVE PREVIEW

CS 839: Design the Next-Generation Database Lecture 4: Multicore - - PowerPoint PPT Presentation

CS 839: Design the Next-Generation Database Lecture 4: Multicore (Part I) Xiangyao Yu 1/30/2020 1 Announcements Email me if you are not in HotCRP https://wisc-cs839-ngdb20.hotcrp.com New deadline for submitting paper review: Before lecture


slide-1
SLIDE 1

Xiangyao Yu 1/30/2020

CS 839: Design the Next-Generation Database Lecture 4: Multicore (Part I)

1

slide-2
SLIDE 2

Announcements

Email me if you are not in HotCRP https://wisc-cs839-ngdb20.hotcrp.com New deadline for submitting paper review: Before lecture starts This course is on PhD breadth requirement list Please talk to me to discuss project ideas

2

slide-3
SLIDE 3

Discussion Highlights

Transactions on column-store

  • Pros: Compression, good for read workload, good for sequential writes
  • Cons: More I/O for row selection/update/insert

Data format for HTAP?

  • Hot data in row format, convert cold data to column format in background
  • Different formats in replicas

Small processor near disk

  • Compression/decompression, encryption, filtering, sorting, hashing, hot data
  • Coalesce random accesses
  • Fast indexing

3

slide-4
SLIDE 4

Today’s Paper

4

slide-5
SLIDE 5

Story Behind the Paper

5

Lesson learned: Talk to people about your research

slide-6
SLIDE 6

6

Many-core systems have arrived

ØThe era of single-core CPU speed-up is

  • ver

ØNumber of cores on a chip is increasing exponentially

§ 1000-core chips are a near…

ØDBMSs are not ready

§ Most DBMSs still focus on single-threaded performance § Existing works on multi-cores focus on small core count Xeon Phi (up to 61 cores) Tilera (up to 100 cores)

slide-7
SLIDE 7

7

Many-core systems have arrived

slide-8
SLIDE 8

8

Databases on 1000-core systems

Ø DBMS on future computer architectures Ø Will DBMSs scale to this level of parallelism?

§ What are the main bottlenecks to scalability? § What improvements will be needed from the software and hardware perspectives?

All classic concurrency control algorithms fail to scale to 1000 cores.

slide-9
SLIDE 9

Ø On Line Transaction Processing (OLTP) Ø Concurrency control is a key limiting factor to the scalability Ø new database: DBx1000

§ Support all seven classic concurrency control algorithms § Study the fundamental bottlenecks § https://github.com/yxymit/DBx1000

Ø Graphite Multi-core Simulator

1000-Core DBMS

slide-10
SLIDE 10

Simulated Hardware

10

Simulated Hardware

  • CPU: 1024 in-order core
  • Cache: 32KB L1, 512KB L2
  • Network: 2D-mesh

32 32 L2$ Core L1$ SW … … …

slide-11
SLIDE 11

Graphite Simulator[1]

11

[1] J. Miller, et al. Graphite: A Distributed Parallel Simulator for Multicores. HPCA’10

slide-12
SLIDE 12

Concurrency Control Schemes

12

CC Scheme Description

DL_DETECT 2PL with deadlock detection NO_WAIT 2PL with non-waiting deadlock prevention WAIT_DIE 2PL with wait-and-die deadlock prevention TIMESTAMP Basic T/O algorithm MVCC Multi-version T/O OCC Optimistic concurrency control HSTORE T/O with partition-level locking

Two–Phase Locking (2PL) Timestamp Ordering (T/O) Partitioning

slide-13
SLIDE 13

2PL – DL_DETECT

13

Wait-for Graph: T1 <---- T2 when T2 waits for a lock held by T1 Periodically, detect cycles in the graph and abort the transaction that holds the fewest locks

slide-14
SLIDE 14

2PL – NO_WAIT, WAIT_DIE

14

NO_WAIT: A transaction cannot wait for another transaction. Whenever two transactions conflict, the requesting transaction aborts. WAIT_DIE: A transaction T1 waits for another transaction T2 only if T1 has higher priority than T2 (e.g., T1 starts execution before T2). Pros over NO_WAIT

  • Guaranteed forward progress (i.e., no starvation)
  • Fewer aborts

Cons over NO_WAIT

  • Locking logic is more complex
slide-15
SLIDE 15

Timestamp Ordering – Basic

15

Each transaction is assigned a unique timestamp indicating the serial order Timestamp Order wts=10 rts=20 Read from T (T.ts.= 15)

slide-16
SLIDE 16

Timestamp Ordering – Basic

16

Each transaction is assigned a unique timestamp indicating the serial order Timestamp Order wts=10 rts=20 Read from T (T.ts.= 5)

slide-17
SLIDE 17

Timestamp Ordering – Basic

17

Each transaction is assigned a unique timestamp indicating the serial order Timestamp Order wts=10 rts=20 Read from T (T.ts.= 25)

slide-18
SLIDE 18

Timestamp Ordering – Basic

18

Each transaction is assigned a unique timestamp indicating the serial order Timestamp Order wts=10 rts=25 Read from T (T.ts.= 25)

slide-19
SLIDE 19

Timestamp Ordering – Basic

19

Each transaction is assigned a unique timestamp indicating the serial order Timestamp Order wts=10 rts=20 Write from T (T.ts.= 15)

slide-20
SLIDE 20

Timestamp Ordering – Basic

20

Each transaction is assigned a unique timestamp indicating the serial order Timestamp Order wts=10 rts=20 Write from T (T.ts.= 5)

slide-21
SLIDE 21

Timestamp Ordering – Basic

21

Each transaction is assigned a unique timestamp indicating the serial order Timestamp Order wts=10 rts=20 Write from T (T.ts.= 25)

slide-22
SLIDE 22

Timestamp Ordering – Basic

22

Each transaction is assigned a unique timestamp indicating the serial order Timestamp Order Write from T (T.ts.= 25) rts=wts=25

slide-23
SLIDE 23

Timestamp Ordering – MVCC

23

MVCC: Multi-Version Concurrency Control Timestamp Order wts=10 rts=20 Read from T (T.ts.= 5)

slide-24
SLIDE 24

Timestamp Ordering – MVCC

24

MVCC: Multi-Version Concurrency Control Timestamp Order wts=10 rts=20 Read from T (T.ts.= 5) A transaction can read previous versions

slide-25
SLIDE 25

Timestamp Ordering

25

Pros:

  • Timestamp order is the serialization order
  • Logic for locking is simplified
  • In MVCC, read-only and read-write transactions do not conflict

Cons:

  • Timestamp allocation is a bottleneck
slide-26
SLIDE 26

Pessimistic/Optimistic vs. 2PL/TO

26

Pessimistic Optimistic Timestamp Ordering MVCC

slide-27
SLIDE 27

Partition-Level Locking – H-store

Pro: Only one lock per partition Con: Performance degrades for multi-partition transactions

27

slide-28
SLIDE 28

Partition-Level Locking – H-store

28

Single Partition Transaction Multi Partition Transaction % of Multi-partition Txn

slide-29
SLIDE 29

Evaluation – Experimental Setup

Yahoo! Cloud Serving Benchmark (YCSB)

  • 20 million tuples
  • Each tuple is 1KB (total database is ~20GB)

Each transaction reads/modifies 16 random tuples following a skewed pattern Serializable isolation level

29

slide-30
SLIDE 30

Evaluation – Readonly

30

2PL schemes are scalable for read-only benchmarks

slide-31
SLIDE 31

Evaluation – Readonly

31

2PL schemes are scalable for read-only benchmarks Timestamp allocation limits scalability

slide-32
SLIDE 32

Evaluation – Readonly

32

2PL schemes are scalable for read-only benchmarks Timestamp allocation limits scalability Memory copy hurts performance

slide-33
SLIDE 33

Evaluation – Medium Contention

Write : Read = 50% : 50%

33

DL_DETECT does not scale due to deadlocks and thrashing

slide-34
SLIDE 34

Evaluation – High Contention

34

Scaling stops at small core count

slide-35
SLIDE 35

Evaluation – High Contention

35

Scaling stops at small core count NO_WAIT has good performance until 1000 cores

slide-36
SLIDE 36

Evaluation – High Contention

36

Scaling stops at small core count NO_WAIT has good performance until 1000 cores OCC wins at 1000 cores

slide-37
SLIDE 37

Scalability Bottlenecks

37 Concurrency Control Waiting (Thrashing) High Abort Rate Timestamp Allocation Multi- partition DL_DETECT NO_WAIT WAIT_DIE TIMESTAMP MULTIVERSION OCC HSTORE

slide-38
SLIDE 38

Solutions to Timestamp Allocation

38

Mutex based allocation

slide-39
SLIDE 39

Solutions to Timestamp Allocation

39

Mutex based allocation Atomic instruction

slide-40
SLIDE 40

Solutions to Timestamp Allocation

40

Mutex based allocation Atomic instruction Batch allocation

slide-41
SLIDE 41

Solutions to Timestamp Allocation

41

Mutex based allocation Atomic instruction Batch allocation Hardware Counter (~1000 million ts/s)

slide-42
SLIDE 42

Solutions to Timestamp Allocation

42

Mutex based allocation Atomic instruction Batch allocation Hardware Counter (~1000 million ts/s) Distributed Clock (perfect scalability)

– All clocks must be synchronized

slide-43
SLIDE 43

1000-core – Q/A

43

Why 1000? Workload realistic? Simulator (Graphite) realistic? Distributed transactions?

  • Harding, R., Van Aken, D., Pavlo, A. and Stonebraker, M., An evaluation of distributed

concurrency control. VLDB 2017

  • Similar conclusions

Abyss removed?

slide-44
SLIDE 44

Summary

44

Core counts will keep increasing Conventional concurrency control protocols do not scale

  • Lock trashing
  • Timestamp allocation

Need software hardware codesign (software-only solutions can go a long way)

slide-45
SLIDE 45

Group Discussion

What are the pros and cons of timestamp ordering over two-phase locking? Can you think of other examples of using timestamps in other fields of CS? What are the main pros and cons of a multi-version concurrency control (MVCC) protocol? How is MVCC related to HTAP (Hybrid transactional/analytical processing)? Can you think of any hardware changes to a multicore CPU that can improve the performance/scalability of concurrency control?

45

slide-46
SLIDE 46

Before Next Lecture

Submit discussion summary to https://wisc-cs839-ngdb20.hotcrp.com

  • Deadline: Friday 11:59pm

Submit review for

Speedy Transactions in Multicore In-Memory Databases [optional] TicToc: Time Traveling Optimistic Concurrency Control [optional] Hekaton: SQL Server's Memory-Optimized OLTP Engine

46