CS 839: Design the Next-Generation Database Lecture 24: HTAP - PowerPoint PPT Presentation

CS 839: Design the Next-Generation Database Lecture 24: HTAP Xiangyao Yu 4/16/2020 1

Announcements Vote on the topic of the last lecture Option 1: Streaming • [required] Discretized Streams: Fault-Tolerant Streaming Computation at Scale • [optional] Apache Flink TM : Stream and Batch Processing in a Single Engine • [optional] The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing Option 2: Time series • [required] Gorilla: A Fast, Scalable, In-Memory Time Series Database • [optional] Time Series Management Systems: A Survey 2

Discussion Highlights FaaS vs. BaaS for databases • BaaS advantages: simplifies communication and state sharing, caching • BaaS disadvantages: potentially lower CPU and memory utilization • FaaS advantages: fine-granularity pricing model, auto-scaling • FaaS disadvantages: overhead of inter-function coordination, functions have limited resources and execution time, communication through S3, inherently designed for small functions What can BaaS (e.g., Snowflake) borrow from FaaS? • Auto-scaling: Dynamically resource allocation and fine-grained pricing Benefits and limiting factors of running OLTP on serverless computing? • Benefits: Elastic scaling based on demand, transactions are inherently short- lived • Limiting factors: S3 has no read-after-write consistency, concurrency control is hard due to lack of communication 3

Today’s Paper ICDE 2011 4

HTAP: Hybrid Transactional/Analytical Processing Hybrid transactional/analytical processing (HTAP), a term created by Gartner Inc in 2014: Hybrid transactional/analytical processing (HTAP) is an emerging application architecture that "breaks the wall" between transaction processing and analytics. It enables more informed and "in business real time" decision making. Key advantage: reducing time to insight 5

OLTP vs. OLAP (Slide from L2) Transactions • Takes hours for conventional databases • Takes seconds for Hybrid transactional/analytical processing (HTAP) systems OLTP database OLAP database (Update Intensive) (Read Intensive, rare updates) 6

HTAP Design Options [1] Single System for OLTP and OLAP • Using Separate Data Organization for OLTP and OLAP Hyper • Same Data Organization for both OLTP and OLAP Separate OLTP and OLAP Systems • Decoupling the Storage for OLTP and OLAP • Using the Same Storage for OLTP and OLAP 7 [1] Özcan, Fatma, Yuanyuan Tian, and Pinar Tözün. "Hybrid transactional/analytical processing: A survey." ICMD, 2017.

Background: Through the Looking Glass [2] [2] Harizopoulos, S., Abadi, D. J., Madden, S., & Stonebraker, M. OLTP through the looking glass, and what we 8 found there. SIGMOD 2008

Background: H-STORE [3] Single partition transactions are sequentially executed Multi-partition transactions lock entire partitions Support short, stored-procedure transactions 9 [3] Kallman, R., et al. H-store: a high-performance, distributed main memory transaction processing system. VLDB 2008

Background: VoltDB H-Store is commercialized into VoltDB VoltDB has some cool features • Active-active replication (deterministic execution) • Command logging 10

Hyper Execute analytical queries without blocking transactions 11

Virtual Memory Snapshots Create consistent database snapshot for OLAP queries to read Transactions run with copy-on-write to avoid polluting the snapshots 12

Fork() Linux Programmer's Manual fork () creates a new process by duplicating the calling process. The new process is referred to as the child process. The calling process is referred to as the parent process. Does not copy all the memory pages Does copy the parent’s page table (all pages set to readonly mode) Copy-on-write (COW) • If any page is modified by either parent or child process, a new page is created for the corresponding process 13

Cost of Fork() Cost of fork() is proportional to the page table size, which depends on • Database size • Page size 14

Fork-Based Virtual Snapshots OLTP process OLAP process OLTP process OLAP process Page tables Page Page’ Page ref=1 ref=2 15 ref=1

Multiple OLAP Session OLAP Session: Group of OLAP queries that access the same snapshot OLTP process A ref=1 B ref=1 16

Multiple OLAP Session OLAP Session: Group of OLAP queries that access the same snapshot OLTP process Snapshot 1 A ref=2 B ref=2 17

Multiple OLAP Session OLAP Session: Group of OLAP queries that access the same snapshot OLTP process Snapshot 1 A’ ref=1 A ref=1 B ref=2 18

Multiple OLAP Session OLAP Session: Group of OLAP queries that access the same snapshot OLTP process Snapshot 1 A’ ref=2 A ref=1 B ref=3 Snapshot 2 19

Multiple OLAP Session OLAP Session: Group of OLAP queries that access the same snapshot OLTP process Snapshot 1 A’ ref=2 A ref=1 B ref=2 Snapshot 2 20

Multi-Threaded OLTP Processing Single-partition transaction • Sequential execution within partition • Different partitions run in parallel Multi-partition transaction • System-wide sequential execution 21

Multi-Threaded OLTP Processing Single-partition transaction • Sequential execution within partition • Different partitions run in parallel Multi-partition transaction • System-wide sequential execution When to fork()? • Option 1: Fork after quiescing all threads • Option 2: Fork in the middle of a transaction and then undo the transaction’s changes 22

Logging and Checkpointing Logging • Logical redo logging Checkpointing • Based on the same VM snapshot mechanism 23

Evaluation – Performance Comparison Config 1 Config 2 Config 3 24

Evaluation – Memory Consumption 25

Hyper Today? 26

HTAP – Q/A State-of-the-art in HTAP? Overhead of Hyper? Row-format has the same performance as column-format for OLTP? Really necessary to do real-time analytical work? What if data does not fit in memory? (Anti-caching) Why not using shared memory and a concurrency control? Why logical logging is a problem in conventional system? Evaluation is weak Analytical data no longer fits in memory in 2020 27

Topic of the Last Lecture Option 1: Streaming • [required] Discretized Streams: Fault-Tolerant Streaming Computation at Scale • [optional] Apache Flink TM : Stream and Batch Processing in a Single Engine • [optional] The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing Option 2: Time series • [required] Gorilla: A Fast, Scalable, In-Memory Time Series Database • [optional] Time Series Management Systems: A Survey 28

Group Discussion What are the challenges of applying the VM-snapshot idea to a shared-memory OLTP system? Fork() replicates the page table, which is expensive when the database is large. Can you think of any approach to reduce this cost? Given the four possible designs of HTAP ({single system, separate system} x {shared data, separate data}), which one is the most promising in your opinion? What if you have infinite number of machines? 29

CS 839: Design the Next-Generation Database Lecture 24: HTAP - PowerPoint PPT Presentation

CS 839: Design the Next-Generation Database Lecture 24: HTAP Xiangyao Yu 4/16/2020 1 Announcements Vote on the topic of the last lecture Option 1: Streaming [required] Discretized Streams: Fault-Tolerant Streaming Computation at Scale

The Case for Heterogeneous HTAP Raja Appuswamy, Manos Karpathiotakis, Danica Porobic, and

CS 839: Design the Next-Generation Database Lecture 6: Deterministic Database Xiangyao Yu

CS 839: Design the Next-Generation Database Lecture 7: GPU Database Xiangyao Yu 2/11/2020 1

CS 839: Design the Next-Generation Database Lecture 4: Multicore (Part I) Xiangyao Yu 1/30/2020

CS 839: Design the Next-Generation Database Lecture 19: RDMA for OLAP Xiangyao Yu 3/31/2020 1

CS 839: Design the Next-Generation Database Lecture 14: Process in Memory Xiangyao Yu 3/5/2020

CS 839: Design the Next-Generation Database Lecture 20: OLTP in Cloud Xiangyao Yu 4/2/2020 1

CS 839: Design the Next-Generation Database Lecture 2: Transaction Basics Xiangyao Yu 1/23/2020

CS 839: Design the Next-Generation Database Lecture 23: Serverless Xiangyao Yu 4/14/2020 1

CS 839: Design the Next-Generation Database Lecture 1: Introduction Xiangyao Yu 1/21/2020 Who

CS 839: Design the Next-Generation Database Lecture 22: Snowflake Xiangyao Yu 4/9/2020 1

CS 839: Design the Next-Generation Database Lecture 17: Smart NIC Xiangyao Yu 3/24/2020 1

CS 839: Design the Next-Generation Database Lecture 13: Smart SSD Xiangyao Yu 3/3/2020 1

Joint WMO/TF HTAP/GEO Workshop on Integrated Observations for Assessing Hemispheric Transport

1 A1: policy context A2 Observational Evidence and Capabilities Related to Intercontinental

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

Development Techniques for Native/Hybrid Tizen Apps Presented by Kirill Kruchinkin Agenda

Media Management and Distribution Workshop W ho are the key players, Operated by w here and how

Gradient Discretization of Hybrid Dimensional Darcy Flows in Fractured Porous Media Konstantin

Making full & immediate OA a reality The role of the institutional OA policy Chris Banks ::

TRECVID 2015 Paul Over^ George Awad# Alan Smeaton (Dublin City University) Ian Soboroff *

The value of hyperlocal news in the UK: A new analysis of content Andy Williams (Cardi fg

Digital business models Principles of Journalism Feb. 27, 2017 Quiz Turning Point Cloud

On-line Video-Editing Challenges in Storisphere Steven Simpson David Hutchison Mu Mu, James

CS 839: Design the Next-Generation Database Lecture 24: HTAP - PowerPoint PPT Presentation

CS 839: Design the Next-Generation Database Lecture 24: HTAP Xiangyao Yu 4/16/2020 1 Announcements Vote on the topic of the last lecture Option 1: Streaming [required] Discretized Streams: Fault-Tolerant Streaming Computation at Scale

The Case for Heterogeneous HTAP Raja Appuswamy, Manos Karpathiotakis, Danica Porobic, and

CS 839: Design the Next-Generation Database Lecture 6: Deterministic Database Xiangyao Yu

CS 839: Design the Next-Generation Database Lecture 7: GPU Database Xiangyao Yu 2/11/2020 1

CS 839: Design the Next-Generation Database Lecture 4: Multicore (Part I) Xiangyao Yu 1/30/2020

CS 839: Design the Next-Generation Database Lecture 19: RDMA for OLAP Xiangyao Yu 3/31/2020 1

CS 839: Design the Next-Generation Database Lecture 14: Process in Memory Xiangyao Yu 3/5/2020

CS 839: Design the Next-Generation Database Lecture 20: OLTP in Cloud Xiangyao Yu 4/2/2020 1

CS 839: Design the Next-Generation Database Lecture 2: Transaction Basics Xiangyao Yu 1/23/2020

CS 839: Design the Next-Generation Database Lecture 23: Serverless Xiangyao Yu 4/14/2020 1

CS 839: Design the Next-Generation Database Lecture 1: Introduction Xiangyao Yu 1/21/2020 Who

CS 839: Design the Next-Generation Database Lecture 22: Snowflake Xiangyao Yu 4/9/2020 1

CS 839: Design the Next-Generation Database Lecture 17: Smart NIC Xiangyao Yu 3/24/2020 1

CS 839: Design the Next-Generation Database Lecture 13: Smart SSD Xiangyao Yu 3/3/2020 1

Joint WMO/TF HTAP/GEO Workshop on Integrated Observations for Assessing Hemispheric Transport

1 A1: policy context A2 Observational Evidence and Capabilities Related to Intercontinental

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

Development Techniques for Native/Hybrid Tizen Apps Presented by Kirill Kruchinkin Agenda

Media Management and Distribution Workshop W ho are the key players, Operated by w here and how

Gradient Discretization of Hybrid Dimensional Darcy Flows in Fractured Porous Media Konstantin

Making full &amp; immediate OA a reality The role of the institutional OA policy Chris Banks ::

TRECVID 2015 Paul Over^ George Awad# Alan Smeaton (Dublin City University) Ian Soboroff *

The value of hyperlocal news in the UK: A new analysis of content Andy Williams (Cardi fg

Digital business models Principles of Journalism Feb. 27, 2017 Quiz Turning Point Cloud

On-line Video-Editing Challenges in Storisphere Steven Simpson David Hutchison Mu Mu, James

Making full & immediate OA a reality The role of the institutional OA policy Chris Banks ::