Durable Transactional Memory Can Scale With TimeStone * R. Madhava - PowerPoint PPT Presentation

Durable Transactional Memory Can Scale With TimeStone * R. Madhava Krishnan , Jaeho Kim * , Ajit Mathew, Xinwei Fu, Anthony Demeri, Changwoo Min, Sudarsun Kannan + +

Executive Summary ➢ TimeStone is a highly scalable Durable Transaction Memory (DTM) ○ Goals: High scalability, performance and low write amplification ○ Technique: Hybrid DRAM-NVMM logging and MVCC ➢ A novel Hybrid DRAM-NVMM logging approach for ○ High performance and low write amplification ➢ TimeStone adopts Multi-Version Concurrency Control (MVCC) model ○ For high scalability and support multiple isolation levels ➢ Scales upto 112 cores and has write amplification <= 1 2

Talk Outline ➢ Motivation ➢ Overview ➢ Design ➢ Evaluation 3

Non-Volatile Main Memory (NVMM) ➢ NVMM has arrived! ➢ Storage like characteristics ○ Data persistence ○ Large capacity ➢ Memory like performance ○ ~100x faster than SSDs ○ Offers byte-addressability 4

Durable Transactional Memory (DTM) ➢ DTMs are software framework supporting ACID properties ➢ DTMs makes NVMM programming easier ➢ Relieves the burden on NVMM application developers ➢ There are some serious problems that needs immediate attention ➢ Poor Scalability ➢ High Write Amplification (up to 6x) 5

Review of Existing DTMs ➢ State-of-art DTMs focuses on reducing the crash consistency cost ○ DudeTM [ASPLOS-17] ○ Romulus [SPAA-18] ➢ To reduce the crash consistency overhead ○ DudeTM keeps logging operations out of critical path ○ Romulus maintains a backup heap to eliminate logging operations ➢ Existing DTMs incurs high Write Amplification in the course of reducing the crash consistency cost 6

Review of Existing DTMs ➢ What is Write Amplification (WA)? ○ Additional bytes written to NVMM for each user requested bytes ➢ Why is it a serious problem? ○ Low write endurance of NVMM ○ Additional writes generates unnecessary traffic at the NVMM ➢ Hence critical path latency increases and performance drops ➢ None of the DTMs considers Many-core Scalability 7

Existing DTMs Are Not Scalable Poor Scalability Romulus Performance Scalability is inevitable!! None of the DTMs scale Saturates beyond 16 cores!!! DudeTM 16 PMDK 8

The Reasons for Poor Scalability 1. Low RW Parallelism Romulus ➢ Poor scalability of the underlying STM ○ DudeTM eg) DudeTM [ASPLOS-17] ➢ Supports only single Writer PMDK ○ eg) Romulus [SPAA-18], ○ PMDK [Intel] 9

The Reasons for Poor Scalability 2. High Write Amplification DTM Systems Write Amplification(WA) Libpmemobj 70x ➢ Additional bytes written to NVMM Romulus 2x ➢ Crash Consistency Overhead DudeTM 4-6x ➢ Metadata Overhead KaminoTx 2x Mnemosyne 4-7x ➢ High WA in the critical path ○ Impacts the system throughput 10

So What Do We Need Now? ➢ A scalable and high performance DTM Our Solution: ➢ Low write amplification TimeStone 11

Two Main Goals of TimeStone 1) Achieve High Scalability and Performance 2) Reduce Write Amplification significantly 13

Goal 1 - To Achieve High Scalability ➢ TimeStone adopts Multi-Version Concurrency Control (MVCC) ➢ Supports non-blocking reads and concurrent disjoint writes ➢ MVCC provides better RW parallelism ➢ Let’s illustrate how MVCC works! 14

Illustration - MVCC Programming Model CASE 1: Concurrent Readers Reader-1 Reader-2 Reader-3 Reader-4 Node A Node B Node C Node D Timestone Supports Non-Blocking Reads 15

Illustration - MVCC Programming Model CASE 2: Concurrent Writers One of the Writers Disjoint Writers Succeeds and Others Abort Writer-1 Writer-2 Writer-3 Node A Node B Node C Node D Timestone Supports Disjoint Writes 16

Goal 1 - To Achieve High Scalability ➢ MVCC provides better RW Parallelism ➢ But that's not just enough for better scalability! ➢ Two reasons for poor scalability ○ Low RW Parallelism ⇒ solved by adopting MVCC ○ High Write Amplification ➢ MVCC can incur very high Write Amplification 17

Goal 1 - To Achieve High Scalability ➢ We optimize MVCC for NVMM to achieve better Scalability ○ ➢ MVCC for better RW parallelism Reduce Write Amplification ○ Asynchronous Garbage Collection (Refer Paper) ➢ Optimize MVCC for NVMM 18

Goal 2 - Low Write Amplification ➢ TOC logging is a multilayered hybrid DRAM-NVMM logging ○ T ransient Version log in DRAM (Tlog) ■ To leverage faster DRAM for better coalescing ○ O perational log in NVMM (Olog) ■ To Guarantee Immediate Durability ○ C heckpoint log in NVMM (Clog) ■ To Guarantee Correct Recovery ➢ TOC logging is key to achieve low write amplification 19

Reducing Write Amplification in TimeStone Update_node (A , V 3 ) “Clog is 70% filled, I need Immediate Durability DRAM to free up some space!! with low Overhead Let me trigger Writeback” Node A Olog NVMM V 9 update_node update_node update_node (A, V 1 ) (A, V 2 ) (A, V 3 ) ➢ Oplog for low Crash Consistency Overhead “Tlog is 70% filled, I need to free up some space!! ➢ Log coalescing for Low Metadata Overhead Let me trigger checkpointing” Writeback Tlog Clog Node A Node A Node A V 1 V 2 V 3 Node A Node A Node A Node A Writes Coalesced V 3 V 5 V 7 V 9 Checkpoints Coalesced Checkpointing Metadata Overhead Reduced Metadata Overhead Reduced 20

Object Structure In TimeStone: Master Object DRAM NVMM ➢ TimeStone is an object based DTM ➢ User defined persistent structure called the master object ➢ For eg., a simple linked list Master Object Master Object Master Object Master Object A B C D 22

Object Structure in TimeStone: Version Object ➢ Different versions of one master object called DRAM NVMM the Version object Master Object Master Object Master Object Master Object A B C D Version Object B 2 Version Object C 2 Version Object D 2 Version Object A 2 Version chain Version Object B 1 Version Object C 1 Version Object D 1 Version Object A 1 23

Writes in TimeStone Update(B, B 1 ) DRAM Master Object NVMM B Any number of writers can simultaneously work on 3 Tlog the disjoint Master Objects 4 Linearization point 1 Assign the wrt-clk 77 Version Object B 1 Master Object B Olog 2 Durability point Update(B, B 1 ) 24

Dereferencing - Finding the Right Version Reader DRAM Master Object B local-clk = 55 NVMM Reader Version Object B4 Which Version Object to wrt-clk >= local-clk Any number of readers can simultaneously traverse local-clk = 55 wrt-clk=70 dereference? the version chain without being blocked Reader Version Object B3 Read the first Version Object wrt-clk <= local-clk local-clk = 55 wrt-clk=50 with wrt-clk <= local-clk Version Object B2 wrt-clk=40 25

Other Interesting Features in TimeStone ➢ Mixed isolation support ➢ Asynchronous time based garbage collection ➢ More details on the design 26

Evaluation Questions ➢ What is the write amplification in TimeStone? ➢ Is log coalescing beneficial? ➢ Does TimeStone scale? ➢ What is the impact on real-world workload? 28

Evaluation Settings ➢ Real NVMM server (Intel DCPMEM) ○ 1TB NVMM and 337GB DRAM ○ 2.5 GHZ 112 core Intel Cascade Lake processor ➢ Benchmarks ○ Microbenchmarks - List, Hash Table, BST ○ Application Benchmarks - Kyotocabinet and YCSB ➢ Workloads ○ Different update ratios, access patterns and data set size ➢ Compared against state-of-art DTM systems 29

Write Amplification for Write-intensive (80% Update) Hash Table Write Amplification of PMDK is 70 even for 2% Update case Write Amplification of TimeStone is always <= 1 30

Write Coalescing in TOC Logging 100% Only 7% of writes are ➢ checkpointed from Tlog The rest are coalesced in ➢ the Tlog Only 0.01% of writes are ➢ written back to master The rest are coalesced in ➢ the Tlog and Clog 16% 0.01% 7% 31

Scalability for Read-Mostly Hash Table (2% Update) TimeStone scales linearly TimeStone is 70x faster than Romulus 32

Scalability for Write-Intensive Hash Table (80% Update) TimeStone still scales linearly With MVCC TimeStone supports better RW parallelism than existing DTMs and hence it Scales better TimeStone performs 100x faster than DudeTM Low Write Amplification in TimeStone makes the critical path shorter and eventually a better performance and Scalability 33

Real-World Application - KyotoCabinet TimeStone enabled KyotoCabinet scales well in addition to offering Performs upto 3x better with Crash Consistency additionally supporting Crash Consistency Vanilla KyotoCabinet running on DRAM Vanilla KyotoCabinet running on NVMM without Crash consistency 34

Discussion ➢ Durable Transactional Memory Systems ○ Romulus[SPAA-18], DudeTM[ASPLOS-17], PMDK, Mnemosyne[ASPLOS-11] ➢ Inspired from in-memory databases ○ Ermia[SIGMOD-16], Cicada[SIGMOD-17] ➢ Also non-linearizable synchronization algorithms ○ RCU[OLS-02], RLU[SOSP-15], MV-RLU[ASPLOS-19] ➢ Future work ○ Provide memory safety and reliability in TimeStone ○ Extend TimeStone to support distributed transaction s 35

Durable Transactional Memory Can Scale With TimeStone * R. Madhava - PowerPoint PPT Presentation

Durable Transactional Memory Can Scale With TimeStone * R. Madhava Krishnan , Jaeho Kim * , Ajit Mathew, Xinwei Fu, Anthony Demeri, Changwoo Min, Sudarsun Kannan + + Executive Summary TimeStone is a highly scalable Durable Transaction

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 25 November 2016 Lecture 8

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 27 November 2015 Lecture 8

DHTM: Durable Hardware Transactional Memory Arpit Joshi , Vijay Nagarajan, Marcelo Cintra, Stratis

Transactional Memory: Architectural support for Lock-Free Data Structure Transactional Memory:

Transactional memory with data Transactional memory with data invariants: or putting the

Hardware Transactional Memory Shao-Hung Chiu, Upasana Sridhar Transactional Memory - Where did

ReDSS Durable Solutions Framework Understanding progress towards durable solutions CONTENT 1.

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Transactional Memory 1 To read more This days papers: Herlihy and Moss, Transactional

Extending Hardware Transactional Memory to Support Non-busy Waiting and Non-transactional Actions

Verification of Transactional Memories that support Non-Transactional Memory Accesses Ariel Cohen

Evaluating the Impact of Transactional Characteristics on the Performance of Transactional Memory

NCTracks: Prior Approval Durable Medical Equipment Prior Approval Durable Medical Equipment

Time-Warp: Lightweight Abort Minimization in Transactional Memory Nuno Diegues and Paolo Romano

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

Tamper Resistance - a Cautionary Note Ross Anderson Markus Kuhn University of Cambridge

Data Systems for the Cloud Instructor: Matei Zaharia cs245.stanford.edu Outline What is the

1 Defining the Legal Schedules Defining the Legal Schedules The Graph Test for Serializability

OldSQL vs. NoSQL vs. NewSQL on New OLTP Michael Stonebraker,

drug holidays ver 7-10 7/13/2018 Long-term Treatment and Drug Financial Disclosures Holidays

Care and Feeding of Lead Acid Batteries Revision 1.0 2016 Presented by: Frederick B. Cook

Real-time Data Pipelines with Structured Streaming in Tathagata TD Das @tathadas

Print version Updated: 25 February 2020 Lecture #20 Dissolved Carbon Dioxide: Closed Systems II