Design Tradeoffs of Data Access Methods Manos Athanassoulis and - - PowerPoint PPT Presentation

design tradeoffs of data access methods
SMART_READER_LITE
LIVE PREVIEW

Design Tradeoffs of Data Access Methods Manos Athanassoulis and - - PowerPoint PPT Presentation

Design Tradeoffs of Data Access Methods Manos Athanassoulis and Stratos Idreos declarative interface ask what you want the system decides how to best store and access data db system applications api/sql


slide-1
SLIDE 1

Design Tradeoffs of Data Access Methods

Manos Athanassoulis and Stratos Idreos

slide-2
SLIDE 2

declarative interface

ask ‘’what’’ you want

db system

the system decides “how” to best store and access data

slide-3
SLIDE 3

applications api/sql

cpu memory hierarchy data data data

algorithms/operators

data system kernel: a collection of access methods

slide-4
SLIDE 4

layout structure navigation

an access method is a way to store and access data

slide-5
SLIDE 5

layout structure navigation

e.g., array unordered scan

an access method is a way to store and access data

slide-6
SLIDE 6

layout structure navigation

e.g., array unordered scan e.g., array

  • rdered

binary search

an access method is a way to store and access data

slide-7
SLIDE 7

TREES TRIES HASH TABLES ARRAYS LOG-STRUCTURED TREES MULTI-DIMENTIONAL COLUMNS COLUMN-GROUPS SLOTTED PAGES

slide-8
SLIDE 8

isn’t this a solved problem?

slide-9
SLIDE 9

isn’t this a solved problem? access method design is now as important as ever

slide-10
SLIDE 10

data systems are nearly everywhere…

today

continuous need for new and tailored data systems

y dai

2 2.5

[IB

data grows

slide-11
SLIDE 11

data systems are nearly everywhere…

today tomorrow

continuous need for new and tailored data systems

y dai

2 2.5

[IB

data grows

slide-12
SLIDE 12

data systems are nearly everywhere…

today tomorrow

continuous need for new and tailored data systems

y dai

2 2.5

[IB

data grows

slide-13
SLIDE 13

disk memory A B C D

slide-14
SLIDE 14

disk memory A B C D A BC

row-store engine

  • ption1
slide-15
SLIDE 15

disk memory A B C D A

column- store engine

  • ption2

A BC

row-store engine

  • ption1

X X X

slide-16
SLIDE 16

how many more new access methods to design?

slide-17
SLIDE 17

how many more new access methods to design? it is not about radical new designs only! design, tuning and variations

slide-18
SLIDE 18

say the workload (read/write ratio) shifts (e.g., due to app features): should we use a different data layout for base data - diff updates? should we use different indexing or no indexing?

slide-19
SLIDE 19

say we buy new hardware X (flash/memory): should we change the size of b-tree nodes? should we change the merging strategy in our LSM-tree? say the workload (read/write ratio) shifts (e.g., due to app features): should we use a different data layout for base data - diff updates? should we use different indexing or no indexing?

slide-20
SLIDE 20

say we buy new hardware X (flash/memory): should we change the size of b-tree nodes? should we change the merging strategy in our LSM-tree? say we want to improve response time: would it be beneficial if we would buy faster flash disks? would it be beneficial if we buy more memory? say the workload (read/write ratio) shifts (e.g., due to app features): should we use a different data layout for base data - diff updates? should we use different indexing or no indexing?

slide-21
SLIDE 21

application requirements hardware budget energy profile performance

(hardware and requirements change continuously and rapidly)

conflicting goals moving target

slide-22
SLIDE 22

move from design based on intuition & experience only to a more formal and systematic way to design systems

slide-23
SLIDE 23

goals and structure of the tutorial

structure design space & tradeoffs highlight open problems towards easy to design methods

slide-24
SLIDE 24

goals and structure of the tutorial

~30 min ~40 min

design space basic tradeoffs goals & vision

structure design space & tradeoffs highlight open problems towards easy to design methods

[slides available at daslab.seas.harvard.edu]

slide-25
SLIDE 25

target audience = beginner to expert

no new designs but new connections & structure

slide-26
SLIDE 26

NOT JUST SQL

+

  • perating systems, no sql, sciences
slide-27
SLIDE 27

hardware is a big drive of access method (re)design

(and it continuously evolves)

slide-28
SLIDE 28

registers

  • n chip cache
  • n board cache

memory disk CPU

memory wall

cheaper faster

SRAM DRAM

~1ns ~10ns ~100ns

it is not just memory and disk we want to move as few data items as possible all the way up to the CPU

slide-29
SLIDE 29

random access & page-based access

need to only read x… but have to read all of page 1

page1 page2 page3 data value x

slide-30
SLIDE 30

what is the perfect access method?

slide-31
SLIDE 31

what is the perfect access method?

no single answer; it depends

slide-32
SLIDE 32

what is the perfect access method?

no single answer; it depends

what is the application read patterns write patterns reads/writes ratios hardware (CPU, memory, etc) SLAs

slide-33
SLIDE 33

a perfect access method for reads (point queries)

  • racle

x

find(x)

slide-34
SLIDE 34

a perfect access method for reads (point queries)

  • racle

x

find(x) reads updates memory

slide-35
SLIDE 35

a perfect access method for reads (point queries)

  • racle

x

find(x) reads updates memory

slide-36
SLIDE 36

a perfect access method for reads (point queries)

  • racle

x

find(x) reads updates memory

slide-37
SLIDE 37

a perfect access method for reads (point queries)

  • racle

x

find(x) reads updates memory

slide-38
SLIDE 38

a perfect access method for reads (point queries)

binary search to find(x)

but with no memory overhead

sorted

slide-39
SLIDE 39

a perfect access method for reads (point queries)

binary search to find(x) reads updates memory

but with no memory overhead

sorted

slide-40
SLIDE 40

a perfect access method for reads (point queries)

binary search to find(x) reads updates memory

but with no memory overhead

sorted

slide-41
SLIDE 41

a perfect access method for reads (point queries)

binary search to find(x) reads updates memory

but with no memory overhead

sorted

slide-42
SLIDE 42

a perfect access method for reads (point queries)

binary search to find(x) reads updates memory

but with no memory overhead

sorted

slide-43
SLIDE 43

a perfect access method for writes (point writes)

x

update(x)

x x

update log

slide-44
SLIDE 44

a perfect access method for writes (point writes)

x

update(x) reads updates memory

x x

update log

slide-45
SLIDE 45

a perfect access method for writes (point writes)

x

update(x) reads updates memory

x x

update log

slide-46
SLIDE 46

a perfect access method for writes (point writes)

x

update(x) reads updates memory

x x

update log

slide-47
SLIDE 47

a perfect access method for writes (point writes)

x

update(x) reads updates memory

x x

update log

slide-48
SLIDE 48

it all starts with how we store data every bit matters

design space

slide-49
SLIDE 49

basic tradeoffs

Reads Updates Memory

RUM conjecture, EDBT 2016

slide-50
SLIDE 50

Read Update Memory

max min min min

Reads Updates Memory

slide-51
SLIDE 51

Read Update Memory

max min min min read-op(mized max min min update & memory
  • p-mized
max min min memory-op(mized min min max

Reads Updates Memory

slide-52
SLIDE 52

Logarithmic Design Fractional Cascading Log-structured Updates Sparse Indexing Differential Updates Partitioning Fractional Cascading

Read Update Memory

max min min min

study basic access methods design components how they affect the RUM tradeoffs how are they combined in existing access methods

slide-53
SLIDE 53

Logarithmic Design Fractional Cascading Log-structured Updates Sparse Indexing Differential Updates Partitioning Fractional Cascading

Read Update Memory

max min min min

study basic access methods design components

Part 2

how they affect the RUM tradeoffs how are they combined in existing access methods

slide-54
SLIDE 54

can we make it easy to design/tune access methods?

slide-55
SLIDE 55

disk memory flash

1

easily utilize past concepts

slide-56
SLIDE 56

2

do not miss out on cool ideas and concepts

# of citations 7 14 21 28 35 1996 1999 2002 2005 2008 2011 2014

  • P. O’Neil, E. Cheng, D. Gawlick, E, O'Neil

The log-structured merge-tree (LSM-tree) Acta Informatica 33 (4): 351–385, 1996

slide-57
SLIDE 57

2

do not miss out on cool ideas and concepts

# of citations 7 14 21 28 35 1996 1999 2002 2005 2008 2011 2014

  • P. O’Neil, E. Cheng, D. Gawlick, E, O'Neil

The log-structured merge-tree (LSM-tree) Acta Informatica 33 (4): 351–385, 1996

Google publishes BigTable

slide-58
SLIDE 58

move from design based on intuition & experience only to a more formal and systematic way to design systems

slide-59
SLIDE 59

construct access methods

  • ut of basic components

(and their tradeoffs) e.g., scan*, tree*, bloom filters, bitmaps, hash tables, etc.

slide-60
SLIDE 60

INTERACTIVE DATA SYSTEM DESIGN/TUNING/TESTING

data system designer

slide-61
SLIDE 61

possible opportunities

  • nce we have a “complete” & navigable set of design modules

learn from: s/w engineering, modular dbs, compilers, goes all the way back to basic texts

slide-62
SLIDE 62

possible opportunities

  • nce we have a “complete” & navigable set of design modules

easy to design easy to change/adapt learn from: s/w engineering, modular dbs, compilers, goes all the way back to basic texts

slide-63
SLIDE 63

possible opportunities

  • nce we have a “complete” & navigable set of design modules

testing universal development platform

easy to design easy to change/adapt learn from: s/w engineering, modular dbs, compilers, goes all the way back to basic texts

slide-64
SLIDE 64

possible opportunities

  • nce we have a “complete” & navigable set of design modules

testing universal development platform

easy to design easy to change/adapt

discovery of new combinations

  • f design options

learn from: s/w engineering, modular dbs, compilers, goes all the way back to basic texts

slide-65
SLIDE 65

Part 2: observe how papers fill in gaps in the structure and existing open gaps