CS 839: Design the Next-Generation Database Lecture 4: Multicore - PowerPoint PPT Presentation

CS 839: Design the Next-Generation Database Lecture 4: Multicore (Part I) Xiangyao Yu 1/30/2020 1

Announcements Email me if you are not in HotCRP https://wisc-cs839-ngdb20.hotcrp.com New deadline for submitting paper review: Before lecture starts This course is on PhD breadth requirement list Please talk to me to discuss project ideas 2

Discussion Highlights Transactions on column-store • Pros: Compression, good for read workload, good for sequential writes • Cons: More I/O for row selection/update/insert Data format for HTAP? • Hot data in row format, convert cold data to column format in background • Different formats in replicas Small processor near disk • Compression/decompression, encryption, filtering, sorting, hashing, hot data • Coalesce random accesses • Fast indexing 3

Today’s Paper 4

Story Behind the Paper Lesson learned: Talk to people about your research 5

Many-core systems have arrived Ø The era of single-core CPU speed-up is over Xeon Phi (up to 61 cores) Ø Number of cores on a chip is increasing exponentially § 1000-core chips are a near… Ø DBMSs are not ready Tilera (up to 100 cores) § Most DBMSs still focus on single-threaded performance § Existing works on multi-cores focus on small core count 6

Many-core systems have arrived 7

Databases on 1000-core systems Ø DBMS on future computer architectures Ø Will DBMSs scale to this level of parallelism? All classic concurrency control algorithms fail to scale to 1000 cores. § What are the main bottlenecks to scalability? § What improvements will be needed from the software and hardware perspectives? 8

1000-Core DBMS Ø O n L ine T ransaction P rocessing (OLTP) Ø Concurrency control is a key limiting factor to the scalability Ø new database: DBx1000 § Support all seven classic concurrency control algorithms § Study the fundamental bottlenecks § https://github.com/yxymit/DBx1000 Ø Graphite Multi-core Simulator

Simulated Hardware 32 SW Simulated Hardware … • CPU: 1024 in-order core L2$ 32 • Cache: 32KB L1, 512KB L2 • Network: 2D-mesh L1$ … Core … 10

Graphite Simulator [1] 11 [1] J. Miller, et al. Graphite: A Distributed Parallel Simulator for Multicores . HPCA’10

Concurrency Control Schemes CC Scheme Description DL_DETECT 2PL with deadlock detection Two–Phase NO_WAIT 2PL with non-waiting deadlock prevention Locking (2PL) WAIT_DIE 2PL with wait-and-die deadlock prevention TIMESTAMP Basic T/O algorithm Timestamp MVCC Multi-version T/O Ordering (T/O) OCC Optimistic concurrency control HSTORE T/O with partition-level locking Partitioning 12

2PL – DL_DETECT Wait-for Graph: T1 <---- T2 when T2 waits for a lock held by T1 Periodically, detect cycles in the graph and abort the transaction that holds the fewest locks 13

2PL – NO_WAIT, WAIT_DIE NO_WAIT: A transaction cannot wait for another transaction. Whenever two transactions conflict, the requesting transaction aborts. WAIT_DIE: A transaction T1 waits for another transaction T2 only if T1 has higher priority than T2 (e.g., T1 starts execution before T2). Pros over NO_WAIT • Guaranteed forward progress (i.e., no starvation) • Fewer aborts Cons over NO_WAIT • Locking logic is more complex 14

Timestamp Ordering – Basic Each transaction is assigned a unique timestamp indicating the serial order Read from T (T.ts.= 15) Timestamp Order wts=10 rts=20 15

Timestamp Ordering – Basic Each transaction is assigned a unique timestamp indicating the serial order Write from T (T.ts.= 15) Timestamp Order wts=10 rts=20 19

Timestamp Ordering – Basic Each transaction is assigned a unique timestamp indicating the serial order Write from T (T.ts.= 25) Timestamp Order rts=wts=25 22

Timestamp Ordering – MVCC MVCC: Multi-Version Concurrency Control Read from T (T.ts.= 5) Timestamp Order wts=10 rts=20 23

Timestamp Ordering – MVCC MVCC: Multi-Version Concurrency Control Read from T (T.ts.= 5) Timestamp Order wts=10 rts=20 A transaction can read previous versions 24

Timestamp Ordering Pros: • Timestamp order is the serialization order • Logic for locking is simplified • In MVCC, read-only and read-write transactions do not conflict Cons: • Timestamp allocation is a bottleneck 25

Pessimistic/Optimistic vs. 2PL/TO Pessimistic Optimistic Timestamp Ordering MVCC 26

Partition-Level Locking – H-store Pro: Only one lock per partition Con: Performance degrades for multi-partition transactions 27

Partition-Level Locking – H-store Single Partition Transaction Multi Partition Transaction % of Multi-partition Txn 28

Evaluation – Experimental Setup Yahoo! Cloud Serving Benchmark (YCSB) • 20 million tuples • Each tuple is 1KB (total database is ~20GB) Each transaction reads/modifies 16 random tuples following a skewed pattern Serializable isolation level 29

Evaluation – Readonly 2PL schemes are scalable for read-only benchmarks 30

Evaluation – Readonly 2PL schemes are scalable for read-only benchmarks Timestamp allocation limits scalability 31

Evaluation – Readonly 2PL schemes are scalable for read-only benchmarks Timestamp allocation limits scalability Memory copy hurts performance 32

Evaluation – Medium Contention Write : Read = 50% : 50% DL_DETECT does not scale due to deadlocks and thrashing 33

Evaluation – High Contention Scaling stops at small core count 34

Evaluation – High Contention Scaling stops at small core count NO_WAIT has good performance until 1000 cores 35

Evaluation – High Contention Scaling stops at small core count NO_WAIT has good performance until 1000 cores OCC wins at 1000 cores 36

Scalability Bottlenecks Concurrency Waiting High Abort Timestamp Multi- Control (Thrashing) Rate Allocation partition DL_DETECT NO_WAIT WAIT_DIE TIMESTAMP MULTIVERSION OCC HSTORE 37

Solutions to Timestamp Allocation Mutex based allocation 38

Solutions to Timestamp Allocation Mutex based allocation Atomic instruction 39

Solutions to Timestamp Allocation Mutex based allocation Atomic instruction Batch allocation 40

Solutions to Timestamp Allocation Mutex based allocation Atomic instruction Batch allocation Hardware Counter (~1000 million ts/s) 41

Solutions to Timestamp Allocation Mutex based allocation Atomic instruction Batch allocation Hardware Counter (~1000 million ts/s) Distributed Clock (perfect scalability) – All clocks must be synchronized 42

1000-core – Q/A Why 1000? Workload realistic? Simulator (Graphite) realistic? Distributed transactions? • Harding, R., Van Aken, D., Pavlo, A. and Stonebraker, M., An evaluation of distributed concurrency control . VLDB 2017 • Similar conclusions Abyss removed? 43

Summary Core counts will keep increasing Conventional concurrency control protocols do not scale • Lock trashing • Timestamp allocation Need software hardware codesign (software-only solutions can go a long way) 44

Group Discussion What are the pros and cons of timestamp ordering over two-phase locking? Can you think of other examples of using timestamps in other fields of CS? What are the main pros and cons of a multi-version concurrency control (MVCC) protocol? How is MVCC related to HTAP (Hybrid transactional/analytical processing)? Can you think of any hardware changes to a multicore CPU that can improve the performance/scalability of concurrency control? 45

Before Next Lecture Submit discussion summary to https://wisc-cs839-ngdb20.hotcrp.com • Deadline: Friday 11:59pm Submit review for Speedy Transactions in Multicore In-Memory Databases [optional] TicToc: Time Traveling Optimistic Concurrency Control [optional] Hekaton: SQL Server's Memory-Optimized OLTP Engine 46

CS 839: Design the Next-Generation Database Lecture 4: Multicore - PowerPoint PPT Presentation

CS 839: Design the Next-Generation Database Lecture 4: Multicore (Part I) Xiangyao Yu 1/30/2020 1 Announcements Email me if you are not in HotCRP https://wisc-cs839-ngdb20.hotcrp.com New deadline for submitting paper review: Before lecture

CS 839: Design the Next-Generation Database Lecture 6: Deterministic Database Xiangyao Yu

CS 839: Design the Next-Generation Database Lecture 7: GPU Database Xiangyao Yu 2/11/2020 1

CS 839: Design the Next-Generation Database Lecture 24: HTAP Xiangyao Yu 4/16/2020 1

CS 839: Design the Next-Generation Database Lecture 19: RDMA for OLAP Xiangyao Yu 3/31/2020 1

CS 839: Design the Next-Generation Database Lecture 14: Process in Memory Xiangyao Yu 3/5/2020

CS 839: Design the Next-Generation Database Lecture 20: OLTP in Cloud Xiangyao Yu 4/2/2020 1

CS 839: Design the Next-Generation Database Lecture 2: Transaction Basics Xiangyao Yu 1/23/2020

CS 839: Design the Next-Generation Database Lecture 23: Serverless Xiangyao Yu 4/14/2020 1

CS 839: Design the Next-Generation Database Lecture 1: Introduction Xiangyao Yu 1/21/2020 Who

CS 839: Design the Next-Generation Database Lecture 22: Snowflake Xiangyao Yu 4/9/2020 1

CS 839: Design the Next-Generation Database Lecture 17: Smart NIC Xiangyao Yu 3/24/2020 1

CS 839: Design the Next-Generation Database Lecture 13: Smart SSD Xiangyao Yu 3/3/2020 1

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

THE FINEST HOMES DESERVE www.SabinaKier.com THE FINEST MARKETING. G oinG to the ends of the earth

Database Design October 24, 2008 Database Design Outline Database Design E-R diagrams

Video Consoles - The Next Generation consoles and games from Next Generation 1994 - present

Optimistic Concurrency Control April 13, 2017 1 Serializability Executing transactions

802.11 Denial-of-Service Attacks Real Vulnerabilities and Practical Solutions John Bellardo and

Improving Improving AI Decision Modeling AI Decision Modeling Through Through Utility Theory

Attack Graph Based Metrics for Identifying Critical Cyber Assets in Electric Grid Infrastructure

Outline 2.1 Assembly language program structure 2.2 Data transfer instructions 2.3 Arithmetic

Characteristics of Adapti tive Runtime Systems in HPC Laxmikant (Sanjay) Kale

Chapter 4: Technology and Cost 1 Introduction Firms should transform efficiently inputs into

Real-Time Java for Latency Critical Banking Applications Real-Time Bertrand Delsart System