Main Memory Storage Engines Justin A. DeBrabant - PowerPoint PPT Presentation

Main Memory Storage Engines Justin A. DeBrabant debrabant@cs.brown.edu

Roadmap • Paper 1: Data-Oriented Transaction Execution • Paper 2: OLTP Through the Looking Glass • Paper 3: Generic Database Cost Models for Hierarchical Memory Systems Storage Engines 2

Storage Engine? • the part of the database that actually stores and retrieves data – responsible for db performance • concurrency, consistency – separate from the database “front end” • A single database can have several database engine options – e.g. MySQL supports InnoDB and MyISAM Storage Engines 3

Paper 1 • Data Oriented Transaction Execution – I. Pandis at al. (CMU/EPFL/Northwestern) – VLDB ‘10 Storage Engines 4

Motivation • Hardware has changed – recently, we’ve run into “thermal wall” • hard to fit more resistors per chip • …must abide by Moore’s Law! – add more cores per chip – rely on thread-level parallelism – most current architectures designed in the 80’s • what assumptions were made about the hardware? Storage Engines 5

Thread-to-Transaction Model • in most database engines, each transaction assigned to its own thread – more cores = more parallel threads – each thread responsible for locking shared resources as needed • works fine with a few threads, how about thousands executing concurrently on hundreds of hardware contexts? Storage Engines 6

Data Access Pattern • Each thread only worries about its own transaction – no coordination among transactions • i.e. uncoordinated data access – leads to high lock contention, especially at data “hot spots” Storage Engines 7

Data Access Visualization $./)01 " 23 " 2/04+0,2'34 ! *&345)42'3406- &!! %! !"#$%"&$# $! #! "! ! !'" !'# !'$ !'% $'() ! *+),+- Storage Engines 8

Lock Contention As a Bottleneck 9:?8.->8 "!!@ !-78 ! 9#8:;<$=> '!@ /345 ! 678 ! 93:;< &!@ /345 ! 678 %!@ );=>8 ! 93:;< ?385 #!@ !@ ' $$ &A '2 "!! *+, ! ,(-./ ! 012 Storage Engines 9

The future looks bleak… • Not quite! • Idea: “Coordinate” data access patterns – rather than having threads contending for locks, have transactions contending for threads – distribute the transactions to the data, not data to the transactions Storage Engines 10

Thread-to-Data Model • each thread is coupled with a disjoint subset of the database • threads coordinate access to their own data using a private locking mechanism Storage Engines 11

“Coordinated” Data Access $./)01 " 23 " 1020 ! *!7%8- &!! %! $! #! "! ! !'" !'# !'$ !'% $'() ! *+),+- Storage Engines 12

A Data Oriented Architecture (DORA) • a shared-everything architecture designed to scale to very high core counts • retains ACID properties • data (i.e. relations) are divided into disjoint datasets – 1 executer (thread) per dataset Storage Engines 13

Routing • How to map datasets? – use a routing rule • Routing rules use a subset of columns from a table, called routing fields, to map rows to datasets – in practice, columns from primary or candidate keys are used – can be dynamically updated to balance load Storage Engines 14

Transaction Flow Graphs • used to map incoming transaction to executers • actions are the data access parts of the query • identifiers describe which columns an action uses • What about actions that don’t match routing fields? – called secondary actions, more difficult Storage Engines 15

Secondary Actions • which executer is responsible? – for indexes that don’t index the routing fields, store the routing fields in the leaf nodes • added space overhead? • expensive to update indexes if routing fields are changed? Storage Engines 16

Rendezvous Points • often, data dependencies exist between actions – insert rendezvous points between actions with data dependencies • logically separates execution into different phases • system cannot concurrently execute actions from different phases Storage Engines 17

Executing an Action • 3 structures: – incoming action queue • processed in order received – completed action queue – thread-local lock table • use action identifiers to “lock” data to avoid conflicts Storage Engines 18

Inserts and Deletes • Still need to acquire row-level locks through centralized locking manager – why? • T1 deletes a record • T2 inserts a record into the slot vacated by the record deleted by T1 • T1 aborts but can’t roll back, slot is taken – row-level locks often not a source of contention Storage Engines 19

Experimental Setup • 3 benchmarks used, all OLTP – TM-1 • 7 transactions, 4 with updates – TPC-C • 150 warehouses (approx. 20 GB) – TPC-B • 100 branches (approx. 2 GB) Storage Engines 20

Lock Contention %!! &'()*+ ! ,-.*/ 8!!! 0-1 ! /)2)/ #!!! $!! 345')( ! /)2)/ 7!!! #!! "!!! "!! 6!!! ! ! Storage Engines 21

Throughput !45 !.- " 6 !.- " - ! 7#28#9(1(%+ 8! #! 6!! !"#$%&"'%( ! )*('+, #! %! 7! 7! $! "! "! #! 6! :;0< :;0< :;0< 6! "! =*>)/4?) =*>)/4?) =*>)/4?) ! ! ! ! "8 8! 98 6!! ! "8 8! 98 6!! ! "8 8! 98 6!! -./ ! 0$12 ! )3, -./ ! 0$12 ! )3, -./ ! 0$12 ! )3, Storage Engines 22

Response Times '"# ()*+,-.+ /012 !"#$% ! &'()"*(' ! +,$' ' !"& !"% !"$ !"# ! Storage Engines 23

Conclusions • Traditional database engines not made for the amount of thread-level parallelism seen in machines today – lock contention a major part of that • A thread-to-data approach can significantly reduce lock contention Storage Engines 24

Paper 2 • OLTP Through the Looking Glass, and What we Found There – Stavros Harizopoulos et al. – SIGMOD ‘08 Storage Engines 25

Motivation • Hardware has changed – db systems were designed when memory was sparse – many OLTP databases can fit entirely in memory • Even in memory, there are other bottlenecks – logging, latching, locking, buffer management Storage Engines 26

Alternative Architectures • logless – removing logging • single transaction – remove locking/latching • main memory resident – remove transaction bookkeeping Storage Engines 27

Goals • Remove each of the “unnecessary” parts, one by one, and evaluate performance – Determine relative performance gains by removing each feature Storage Engines 28

Instruction Count Breakdown '&#! *$%++ ')33($ LV9LN .+8# '%#! '%+/$+4%+')%6 LS9SN 94:4&;6/$+4%+')%6 '$#! ,)--("- 94:4;("4<4&";(" P?9PN '"#! !"#$%&'$()"# 94:4*$%++4,)).&; ,)'.("- '##! 1+-(" &#! LP9UN ,/$'0("- %#! PT9MN $#! 1&22+% 3/"/-+% "#! %+3/("("-4)5+%0+/6 R9SN ! 7,)'.("- 7,)--("- 3/"/-+% !"#$% 7*$%++ 7,/$'0("- &'()*+, 71&22+% .+8# !"#$%&'() *&+,"-&.'"/0+%$1+"2/'12$/+'3%&,4.25/'62%'7,89&/+'+%,/0,1+"2/) Storage Engines 29

Instruction Count Breakdown '(&) ')33($ LU9PN *$%++ '(%) .+8# =94:4("#+%$4("6+: =>4:4'%+/$+4%+')%6 '($) LL9TN ,)--("- ==4:4&;6/$+4%+')%6 '(") !"#$%&'$()"# LU9WN >94:4;("4<4&";(" ,)'.("- '(#) >94:4*$%++4,)).&; LR9PN 1+-(" ,/$'0("- (&) WR9UN (%) 1&22+% ($) 3/"/-+% (") U9MN %+3/("("-4)5+%0+/6 (#) 7,)'.("- 7,)--("- 7,/$'0("- 7*$%++ 3/"/-+% !"#$% .+8# &'()*+, 71&22+% !"#$%&':) *&+,"-&.'"/0+%$1+"2/'12$/+'3%&,4.25/'62%';&5'<%.&%'+%,/0,1+"2/) Storage Engines 30

Conclusions • Antiquated disk-based features can cause significant overhead in a main memory system • Each component of a system should be carefully evaluated Storage Engines 31

Paper 3 • Generic Database Cost Models for Hierarchical Memory Systems – S. Manegold et al. – VLDB ‘02 Storage Engines 32

Motivation • Cost models are a key part of query optimization – traditional cost models based on disk accesses • What about in a main memory system? – memory hierarchy • L1, L2, L3, main memory, (solid-state?) Storage Engines 33

Goals • An accurate cost model should weight each memory hierarchy differently – overall “cost” of an operator should be the sum of the cost at all memory hierarchies – each level has different access cost • weight each access by that level’s cost Storage Engines 34

Data Access Patterns • different operators exhibit different data access patterns – pattern dictates both cost and number of caches misses • How to accurately model access patterns? – basic access patterns • single/repetitive sequential traversal, single/ repetitive random traversal, random • compound access patterns Storage Engines 35

Cost Models • For each basic access pattern, derive custom cost model (not shown) • Combine basic access pattern cost models to derive compound access pattern cost models • For each database operator (i.e. sort), map to a cost model Storage Engines 36

Main Memory Storage Engines Justin A. DeBrabant - PowerPoint PPT Presentation

Main Memory Storage Engines Justin A. DeBrabant debrabant@cs.brown.edu Roadmap Paper 1: Data-Oriented Transaction Execution Paper 2: OLTP Through the Looking Glass Paper 3: Generic Database Cost Models for Hierarchical Memory Systems

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Game Engines 1 Overview Game engines are a significant part of the modern games industry

Set 10 Search Engines & SEO Outline How do search engines work? Basic operation

Set 10 Search Engines & SEO Outline How do search engines work? Basic operation

Set11 Search Engines & SEO Outline How do search engines work? Basic operation

Memory Chapter 7 Encoding, Storage and Retrieval of Memor y Encoding Storage

Cache Systems CPU Main Main CPU Memory Memory 400MHz 10MHz Cache 10MHz Memory Hierarchy

Virtual Memory 1 Virtual Memory Main memory is cache for secondary storage

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

Memory Questions? ! What is main memory? CSCI [4|6]730 ! How does multiple processes share memory

EPAs Air Quality Regulations for Stationary Engines for Stationary Engines Melanie King U.S.

NCC Education and You Study and Communication Skills Your Name Internet Search Engines Date

Why learn how to build recommendation engines? Jamen Long Data Scientist DataCamp Building

Network Query Engines Network Query Engines Craig Knoblock USC Information Sciences Institute 1

Imagine for a moment @trentmwillis Lazy Loading Engines: Anything But Lazy Engines allow

CS 423 Operating System Design: The Linux Reference Monitor Professor Adam Bates CS 423:

The w orld of data v is u ali z ation DATA VISU AL IZATION IN R Ron Pearson Instr u ctor

Sancus: Low-cost trustworthy extensible networked devices with a zero-software Trusted Computing

Quantitative Text Analysis. Applications to Social Media Research Pablo Barber a London

Freedom of Access to Scientific Data Prof. Sergio Marchisio Sapienza University, Rome OPEN

IX: A Protected Dataplane Operating System Problem Context The requirements of modern data

Lecture 19: Cache Basics Todays topics: Out-of-order execution Cache hierarchies

Outline Bernsteins perspective CSci 5271 Introduction to Computer Security Announcements

Main Memory Storage Engines Justin A. DeBrabant - PowerPoint PPT Presentation

Main Memory Storage Engines Justin A. DeBrabant debrabant@cs.brown.edu Roadmap Paper 1: Data-Oriented Transaction Execution Paper 2: OLTP Through the Looking Glass Paper 3: Generic Database Cost Models for Hierarchical Memory Systems

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Game Engines 1 Overview Game engines are a significant part of the modern games industry

Set 10 Search Engines &amp; SEO Outline How do search engines work? Basic operation

Set 10 Search Engines &amp; SEO Outline How do search engines work? Basic operation

Set11 Search Engines &amp; SEO Outline How do search engines work? Basic operation

Memory Chapter 7 Encoding, Storage and Retrieval of Memor y Encoding Storage

Cache Systems CPU Main Main CPU Memory Memory 400MHz 10MHz Cache 10MHz Memory Hierarchy

Virtual Memory 1 Virtual Memory Main memory is cache for secondary storage

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

Memory Questions? ! What is main memory? CSCI [4|6]730 ! How does multiple processes share memory

EPAs Air Quality Regulations for Stationary Engines for Stationary Engines Melanie King U.S.

NCC Education and You Study and Communication Skills Your Name Internet Search Engines Date

Why learn how to build recommendation engines? Jamen Long Data Scientist DataCamp Building

Network Query Engines Network Query Engines Craig Knoblock USC Information Sciences Institute 1

Imagine for a moment @trentmwillis Lazy Loading Engines: Anything But Lazy Engines allow

CS 423 Operating System Design: The Linux Reference Monitor Professor Adam Bates CS 423:

The w orld of data v is u ali z ation DATA VISU AL IZATION IN R Ron Pearson Instr u ctor

Sancus: Low-cost trustworthy extensible networked devices with a zero-software Trusted Computing

Quantitative Text Analysis. Applications to Social Media Research Pablo Barber a London

Freedom of Access to Scientific Data Prof. Sergio Marchisio Sapienza University, Rome OPEN

IX: A Protected Dataplane Operating System Problem Context The requirements of modern data

Lecture 19: Cache Basics Todays topics: Out-of-order execution Cache hierarchies

Outline Bernsteins perspective CSci 5271 Introduction to Computer Security Announcements

Set 10 Search Engines & SEO Outline How do search engines work? Basic operation

Set 10 Search Engines & SEO Outline How do search engines work? Basic operation

Set11 Search Engines & SEO Outline How do search engines work? Basic operation