Modular Data Storage with Anvil Mike Mammarella Shant Hovsepian - - PowerPoint PPT Presentation

modular data storage with anvil
SMART_READER_LITE
LIVE PREVIEW

Modular Data Storage with Anvil Mike Mammarella Shant Hovsepian - - PowerPoint PPT Presentation

Modular Data Storage with Anvil Mike Mammarella Shant Hovsepian Eddie Kohler Motivation Data storage and databases drive modern applications Facebook, Twitter, Google Mail, system logs, even Firefox Yet hand-built data stores can


slide-1
SLIDE 1

Modular Data Storage with Anvil

Mike Mammarella Shant Hovsepian Eddie Kohler

slide-2
SLIDE 2

Motivation

  • Data storage and databases drive modern

applications

  • Facebook, Twitter, Google Mail, system logs, even Firefox
  • Yet hand-built data stores can outperform by 100x! [Boncz]
  • Changing the layout of stored data can substantially

improve performance

  • Recent systems implement custom storage engines
  • Custom storage engines are hard to write
  • Reason: Must be consistent, fast for both reads and writes
  • What if you want to experiment with a new layout?

2

slide-3
SLIDE 3

The Question

Can we give applications

3

a simple and efficient modular framework, supporting a wide variety of different data layouts, enabling better performance?

slide-4
SLIDE 4

The Question

Can we give applications

3

a simple and efficient modular framework, supporting a wide variety of different data layouts, enabling better performance? Yes we can!

slide-5
SLIDE 5

Anvil

  • Fine-grained modules called dTables
  • Composable to build complex data stores from simple parts
  • Easy to implement new dTables to store specialized data
  • Isolates all writing to dedicated writable dTables
  • Many data storage layouts only add or change read-only

dTables, which are significantly easier to implement

  • Good disk access characteristics come as well
  • Unifying dTables combine write- and read-optimized dTables

4

slide-6
SLIDE 6

Contributions

  • Fine-grained, modular dTable design
  • Core dTables
  • Overlay dTable, Managed dTable, Exception dTable
  • Anvil implementation
  • Shows that such a system can be fast

5

slide-7
SLIDE 7

dTables

  • Key/value store
  • Keys are integers, floats, strings, or blobs
  • Values are byte arrays
  • Iterators support in-order traversal
  • Most are read-only

6

slide-8
SLIDE 8

dTables

  • Key/value store
  • Keys are integers, floats, strings, or blobs
  • Values are byte arrays
  • Iterators support in-order traversal
  • Most are read-only

6

blob lookup(key k) bool insert(key k, blob v) bool remove(key k) iter iterator() key key() blob value() bool valid() bool next() dTable iterator Slightly simplified, but not much!

slide-9
SLIDE 9

dTables

  • Key/value store
  • Keys are integers, floats, strings, or blobs
  • Values are byte arrays
  • Iterators support in-order traversal
  • Most are read-only

6

key key() blob value() bool valid() bool next() dTable iterator Slightly simplified, but not much! blob lookup(key k) bool insert(key k, blob v) bool remove(key k) iter iterator()

slide-10
SLIDE 10

dTable Layering

  • Applications (and frontends) use the dTable interface
  • But so do other dTables!
  • Transform data
  • Add indices
  • Construct complex functionality from simple pieces

7

slide-11
SLIDE 11

dTable Layering

  • Applications (and frontends) use the dTable interface
  • But so do other dTables!
  • Transform data
  • Add indices
  • Construct complex functionality from simple pieces

7

dTable lookup() lookup() lookup()

slide-12
SLIDE 12

dTable Layering

  • Applications (and frontends) use the dTable interface
  • But so do other dTables!
  • Transform data
  • Add indices
  • Construct complex functionality from simple pieces

7

dTable dTable lookup() lookup() lookup() iterator() iterator()

slide-13
SLIDE 13

dTable Layering

  • Applications (and frontends) use the dTable interface
  • But so do other dTables!
  • Transform data
  • Add indices
  • Construct complex functionality from simple pieces

7

dTable dTable lookup() lookup() lookup() iterator() iterator() wrap iter iter

slide-14
SLIDE 14

An Application-Specific Backend

8

Bloom dTable Overlay dTable Exception dTable B-tree dTable Linear dTable Managed dTable Journal dTable State Dict. dTable Array dTable

slide-15
SLIDE 15

An Application-Specific Backend

8

Bloom dTable Overlay dTable Exception dTable B-tree dTable Linear dTable Managed dTable Journal dTable State Dict. dTable Array dTable

slide-16
SLIDE 16

Application-Specific Data Example

9

  • Want to store the state of residence of customers
  • Identified by mostly-contiguous IDs
  • Most live in the US, but a few don’t
  • Move between states occasionally
  • Common case could be stored efficiently as an array
  • f state IDs
  • But don’t want to penalize the uncommon case
  • Want transactional semantics
slide-17
SLIDE 17

Application-Specific Data Example

9

  • Want to store the state of residence of customers
  • Identified by mostly-contiguous IDs
  • Most live in the US, but a few don’t
  • Move between states occasionally
  • Common case could be stored efficiently as an array
  • f state IDs
  • But don’t want to penalize the uncommon case
  • Want transactional semantics
  • Mostly-contiguous IDs
  • Most live in the US
  • Some live elsewhere
  • Don’t penalize them
  • Occasionally relocate
slide-18
SLIDE 18

Array dTable

10

  • Stores an array of fixed-size values
  • Keys must be contiguous integers
  • Locating data items becomes constant time
  • Can’t store some types of data
  • Read-only

Array

slide-19
SLIDE 19

Bloom dTable Overlay dTable Exception dTable B-tree dTable Linear dTable Managed dTable Journal dTable

11

Storing Common Case Data Efficiently

State Dict. dTable Array dTable

  • Mostly-contiguous IDs
  • Most live in the US
  • Some live elsewhere
  • Don’t penalize them
  • Occasionally relocate
slide-20
SLIDE 20

Bloom dTable Overlay dTable Exception dTable B-tree dTable Linear dTable Managed dTable Journal dTable

11

Storing Common Case Data Efficiently

State Dict. dTable Array dTable

  • Mostly-contiguous IDs
  • Most live in the US
  • Some live elsewhere
  • Don’t penalize them
  • Occasionally relocate
slide-21
SLIDE 21

Bloom dTable Overlay dTable Exception dTable B-tree dTable Linear dTable Managed dTable Journal dTable

11

Storing Common Case Data Efficiently

State Dict. dTable Array dTable

31 “California”

  • Mostly-contiguous IDs
  • Most live in the US
  • Some live elsewhere
  • Don’t penalize them
  • Occasionally relocate
slide-22
SLIDE 22

Bloom dTable Overlay dTable Exception dTable B-tree dTable Linear dTable Managed dTable Journal dTable

11

Storing Common Case Data Efficiently

State Dict. dTable Array dTable

31 “California”

✔ Mostly-contiguous IDs ✔ Most live in the US

  • Some live elsewhere
  • Don’t penalize them
  • Occasionally relocate
slide-23
SLIDE 23

Exception dTable

12

slide-24
SLIDE 24

Exception dTable

12

  • Many data sets mostly but not entirely conform to

some pattern that would allow more efficient storage

slide-25
SLIDE 25

Exception dTable

12

  • Many data sets mostly but not entirely conform to

some pattern that would allow more efficient storage

  • Exception dTable combines a “restricted” dTable with

an “unrestricted” dTable

  • Sentinel value in restricted dTable indicates that the

unrestricted dTable should be checked

slide-26
SLIDE 26

Exception dTable

12

  • Many data sets mostly but not entirely conform to

some pattern that would allow more efficient storage

  • Exception dTable combines a “restricted” dTable with

an “unrestricted” dTable

  • Sentinel value in restricted dTable indicates that the

unrestricted dTable should be checked

  • Simple unrestricted dTable: Linear dTable
slide-27
SLIDE 27

Bloom dTable Overlay dTable Managed dTable Journal dTable

13

Storing All Data

State Dict. dTable Array dTable Exception dTable Linear dTable B-tree dTable ✔ Mostly-contiguous IDs ✔ Most live in the US

  • Some live elsewhere
  • Don’t penalize them
  • Occasionally relocate
slide-28
SLIDE 28

Bloom dTable Overlay dTable Managed dTable Journal dTable

13

Storing All Data

State Dict. dTable Array dTable Exception dTable Linear dTable B-tree dTable ✔ Mostly-contiguous IDs ✔ Most live in the US

  • Some live elsewhere
  • Don’t penalize them
  • Occasionally relocate
slide-29
SLIDE 29

Bloom dTable Overlay dTable Managed dTable Journal dTable

13

Storing All Data

State Dict. dTable Array dTable Exception dTable Linear dTable B-tree dTable ✔ Mostly-contiguous IDs ✔ Most live in the US ✔ Some live elsewhere

  • Don’t penalize them
  • Occasionally relocate
slide-30
SLIDE 30

Bloom dTable Overlay dTable Managed dTable Journal dTable

13

Storing All Data

State Dict. dTable Array dTable Exception dTable Linear dTable B-tree dTable ✔ Mostly-contiguous IDs ✔ Most live in the US ✔ Some live elsewhere

  • Don’t penalize them
  • Occasionally relocate
slide-31
SLIDE 31

Bloom dTable Overlay dTable Managed dTable Journal dTable

13

Storing All Data

State Dict. dTable Array dTable Exception dTable Linear dTable B-tree dTable ✔ Mostly-contiguous IDs ✔ Most live in the US ✔ Some live elsewhere ✔ Don’t penalize them

  • Occasionally relocate
slide-32
SLIDE 32

General dTables

14

  • We’ve seen how to build a read-only data store

specialized for an application-specific layout

  • The pieces can be recombined for other layouts
  • Next section shows how to build a writable store
  • Writable store dTables are common to many layouts
  • Split data write functionality and management policies
slide-33
SLIDE 33

Writable dTables

  • Array dTable is hard to update transactionally
  • Idea: use separate writable dTables
  • Can be optimized for writing (e.g. a log)
  • Several design questions
  • Implementation of write-optimized dTable
  • Building an efficient store from write-optimized and read-only

pieces

15

slide-34
SLIDE 34

Fundamental Writable dTable

16

slide-35
SLIDE 35

Fundamental Writable dTable

16

  • Appends new/updated data to a shared journal

Journal

slide-36
SLIDE 36

Fundamental Writable dTable

16

  • Appends new/updated data to a shared journal

Journal

slide-37
SLIDE 37

Fundamental Writable dTable

16

  • Appends new/updated data to a shared journal
  • All data also cached in an AVL tree in RAM

Journal AVL tree in RAM

slide-38
SLIDE 38

Fundamental Writable dTable

16

  • Appends new/updated data to a shared journal
  • All data also cached in an AVL tree in RAM
  • Should be “digested” when it gets large

Journal AVL tree in RAM

slide-39
SLIDE 39

System Journal

17

  • Chronological order, append-only data store
  • Fast, contiguous writes on disks and other storage devices

like Flash memory

  • Data later rewritten elsewhere in batches from cache
  • Clean the system journal periodically to reclaim space
  • Data already written elsewhere can be omitted
  • Optimization: just delete it and restart if totally empty
  • Uses a transaction system described in the paper
  • Client code chooses start and end of each transaction
  • Durability optional, consistency always provided
slide-40
SLIDE 40

Bloom dTable Overlay dTable Managed dTable

18

Handling Writes

State Dict. dTable Array dTable Exception dTable Linear dTable B-tree dTable Journal dTable ✔ Mostly-contiguous IDs ✔ Most live in the US ✔ Some live elsewhere ✔ Don’t penalize them

  • Occasionally relocate
slide-41
SLIDE 41

Bloom dTable Overlay dTable Managed dTable

18

Handling Writes

State Dict. dTable Array dTable Exception dTable Linear dTable B-tree dTable Journal dTable ✔ Mostly-contiguous IDs ✔ Most live in the US ✔ Some live elsewhere ✔ Don’t penalize them

  • Occasionally relocate
slide-42
SLIDE 42

Combining dTables

19

slide-43
SLIDE 43

Combining dTables

19

  • Have: write-optimized and read-only dTables
  • Want: one dTable that gives the best of both worlds
slide-44
SLIDE 44

Combining dTables

19

Array dTable Array dTable Journal dTable

  • Have: write-optimized and read-only dTables
  • Want: one dTable that gives the best of both worlds
  • Idea: layer multiple read-only dTables together
  • Older data “lower” and newer data “higher”
  • Use a (writable) journal dTable “on top”

Time

slide-45
SLIDE 45

Combining dTables

19

Array dTable Array dTable Journal dTable lookup()

  • Have: write-optimized and read-only dTables
  • Want: one dTable that gives the best of both worlds
  • Idea: layer multiple read-only dTables together
  • Older data “lower” and newer data “higher”
  • Use a (writable) journal dTable “on top”

Time

slide-46
SLIDE 46

Combining dTables

19

Array dTable Array dTable Journal dTable lookup()

Overlay iterator order

  • Have: write-optimized and read-only dTables
  • Want: one dTable that gives the best of both worlds
  • Idea: layer multiple read-only dTables together
  • Older data “lower” and newer data “higher”
  • Use a (writable) journal dTable “on top”

Time

slide-47
SLIDE 47

Combining dTables

19

Array dTable Array dTable Journal dTable lookup()

Overlay iterator order

  • Have: write-optimized and read-only dTables
  • Want: one dTable that gives the best of both worlds
  • Idea: layer multiple read-only dTables together
  • Older data “lower” and newer data “higher”
  • Use a (writable) journal dTable “on top”

Time

slide-48
SLIDE 48

Combining dTables

19

Array dTable Array dTable Journal dTable lookup()

Overlay iterator order

create( )

  • Have: write-optimized and read-only dTables
  • Want: one dTable that gives the best of both worlds
  • Idea: layer multiple read-only dTables together
  • Older data “lower” and newer data “higher”
  • Use a (writable) journal dTable “on top”

Time

slide-49
SLIDE 49

Bloom dTable Managed dTable

20

Unified View

State Dict. dTable Array dTable Exception dTable Linear dTable B-tree dTable Journal dTable Overlay dTable ✔ Mostly-contiguous IDs ✔ Most live in the US ✔ Some live elsewhere ✔ Don’t penalize them

  • Occasionally relocate
slide-50
SLIDE 50

Bloom dTable Managed dTable

20

Unified View

State Dict. dTable Array dTable Exception dTable Linear dTable B-tree dTable Journal dTable Overlay dTable ✔ Mostly-contiguous IDs ✔ Most live in the US ✔ Some live elsewhere ✔ Don’t penalize them

  • Occasionally relocate
slide-51
SLIDE 51

Bloom dTable Managed dTable

20

Unified View

State Dict. dTable Array dTable Exception dTable Linear dTable B-tree dTable Journal dTable Overlay dTable ✔ Mostly-contiguous IDs ✔ Most live in the US ✔ Some live elsewhere ✔ Don’t penalize them ✔ Occasionally relocate

slide-52
SLIDE 52

Managed dTable

21

  • Need a policy for digesting journal dTables
  • Decreases overlay performance, but frees memory
  • Need a policy for combining read-only dTables
  • Restore overlay performance, consolidate data
  • Must balance these goals efficiently

Overlay dTable Journal dTable Array dTable

slide-53
SLIDE 53

Managed dTable

21

  • Need a policy for digesting journal dTables
  • Decreases overlay performance, but frees memory
  • Need a policy for combining read-only dTables
  • Restore overlay performance, consolidate data
  • Must balance these goals efficiently

Overlay dTable Array dTable Journal dTable Array dTable Array dTable

slide-54
SLIDE 54

Managed dTable

21

  • Need a policy for digesting journal dTables
  • Decreases overlay performance, but frees memory
  • Need a policy for combining read-only dTables
  • Restore overlay performance, consolidate data
  • Must balance these goals efficiently

Overlay dTable Array dTable Journal dTable Array dTable Array dTable

slide-55
SLIDE 55

Managed dTable

21

  • Need a policy for digesting journal dTables
  • Decreases overlay performance, but frees memory
  • Need a policy for combining read-only dTables
  • Restore overlay performance, consolidate data
  • Must balance these goals efficiently

Overlay dTable Array dTable Journal dTable Managed dTable Array dTable Array dTable

  • Interfaces with transaction library
  • Allows all other dTables to ignore transactions
slide-56
SLIDE 56

Bloom dTable

22

Managing Long-Term Efficiency

State Dict. dTable Array dTable Exception dTable Linear dTable B-tree dTable Journal dTable Overlay dTable Managed dTable

slide-57
SLIDE 57

Bloom dTable

22

Managing Long-Term Efficiency

State Dict. dTable Array dTable Exception dTable Linear dTable B-tree dTable Journal dTable Overlay dTable Managed dTable

slide-58
SLIDE 58

State Dict. dTable Array dTable Exception dTable Linear dTable B-tree dTable State Dict. dTable Array dTable Exception dTable Linear dTable B-tree dTable State Dict. dTable Array dTable Exception dTable Linear dTable B-tree dTable

Even with combining, we build up several

  • verlaid read-only

dTable subgraphs...

Bloom dTable

22

Managing Long-Term Efficiency

State Dict. dTable Array dTable Exception dTable Linear dTable B-tree dTable Journal dTable Overlay dTable Managed dTable

slide-59
SLIDE 59

State Dict. dTable Array dTable Exception dTable Linear dTable B-tree dTable

State Dict. dTable Array dTable Exception dTable Linear dTable B-tree dTable

State Dict. dTable Array dTable Exception dTable Linear dTable B-tree dTable

Even with combining, we build up several

  • verlaid read-only

dTable subgraphs...

Bloom dTable

22

Managing Long-Term Efficiency

State Dict. dTable Array dTable Exception dTable Linear dTable B-tree dTable Journal dTable Overlay dTable Managed dTable

Most of the data is probably in the older

  • nes, combined

from many others.

slide-60
SLIDE 60

Bloom dTable

23

  • Creates a Bloom filter for the keys in another dTable
  • Accelerates (most) nonexistent key lookups: O(1)!
  • Slightly slows down extant key lookups
  • Takes additional disk space in a separate file
  • Read-only
  • No need to worry about key removal
  • Creates Bloom filter bitmap during create()
  • Particularly useful under overlay dTables
slide-61
SLIDE 61

24

An Application-Specific Backend

State Dict. dTable Array dTable Exception dTable Linear dTable B-tree dTable Journal dTable Overlay dTable Managed dTable Bloom dTable

slide-62
SLIDE 62

24

An Application-Specific Backend

State Dict. dTable Array dTable Exception dTable Linear dTable B-tree dTable Journal dTable Overlay dTable Managed dTable Bloom dTable

slide-63
SLIDE 63

Additional dTables

  • Fixed-size
  • Unique-string
  • Empty
  • Memory
  • Cache
  • Small integer
  • Delta integer

25

Combination array/linear Deduplicates strings Always empty Not persistent Memory cache Strips leading zero bytes Stores differences

slide-64
SLIDE 64

Performance Hypothesis

26

  • Simple configuration changes can improve

performance for specialized workloads

  • Benefits of tailoring dTable configurations to data
  • Performance is good for conventional workloads
  • Replaced SQLite’s update-in-place backend with Anvil
  • Can run a TPC-C-like benchmark (DBT2)
  • Overhead of digesting and combining can be reduced

by background processing

slide-65
SLIDE 65

Evaluating dTable Modularity

27

  • Load a given dTable configuration with 4M values
  • 0.2% of them 7 bytes, others 5 bytes
  • Look up 2M random keys

?? Overlay dTable Managed dTable Journal dTable

slide-66
SLIDE 66

dTable Choice Depends On Data

28

  • Linear + B-tree vs. Array + Exception
  • Keys: contiguous or spaced 1000 apart
  • Anvil’s modularity allows us to choose the right

configuration for this data

1.0 6.8 46.8 320.4 2,192.2 15,000.0 Sparse

Linear + B-tree Array + Exception

5 10 15 20 25 30 Contiguous Lookup Time (s)

slide-67
SLIDE 67

Layered Index dTable Speeds Lookups

29

  • Linear vs. Linear + B-tree
  • Also measure time to create data store
  • Usually a good configuration choice: many lookups

will make up the create cost

0.5 1.0 1.5 2.0 2.5 3.0 Create Time (s) 10 20 30 40 50 60 70 Lookup

Linear Linear + B-tree

slide-68
SLIDE 68

Exception dTable Has Low Overhead

30

  • Linear vs. Array vs. Array + Exception
  • Plain array can store only fixed size values
  • Exception dTable is low overhead vs. array (4%

slower lookups here), but restores full functionality

0.5 1.0 1.5 2.0 2.5 3.0 Create Time (s) 10 20 30 40 50 60 70 Lookup

Linear Array Array + Exception

slide-69
SLIDE 69

How Does Read/Write Separation Perform?

Bloom dTable Overlay dTable B-tree dTable Linear dTable Managed dTable Journal dTable

  • Anvil separates reads and

writes into different dTables in our configurations

  • How does this perform

relative to an update-in- place backend?

  • Run DBT2 TPC-C with 1

warehouse for 15 minutes

  • Simple row store Anvil

configuration

  • Digesting, combining, and

system journal cleaning all set to occur frequently

slide-70
SLIDE 70

Separated Read/Write dTables Are Fast

32

  • Anvil’s durable

configuration outperforms

  • riginal durable

configuration

  • Anvil’s non-durable (but

consistent, i.e. safe) configuration outperforms

  • riginal “async” (i.e.

unsafe) configuration

1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 Durable Non-durable Transactions Per Minute (TPM)

Original backend Anvil backend MySQL

slide-71
SLIDE 71

Better Disk Access Makes Anvil Fast

33

  • Both Anvil configurations have significantly better

disk access characteristics

  • Larger, contiguous writes, better laid out on disk
  • Can write more data in less time with faster seeks

20 40 60 80 100 Durable Non-durable Disk Utilization (%) 1.00 2.82 7.94 22.36 63.00 177.48 500.00 Durable Non-durable Average Request Size (KiB) 1.0 3.4 11.4 38.7 131.0 443.3 1,500.0 Durable Non-durable Writes/sec

Original backend Anvil backend

slide-72
SLIDE 72

Digesting and Combining

34

  • Anvil’s performance benefits don’t come for free
  • Digesting, combining, and cleaning are the price
  • These tasks can be done in the background
  • Read-only source data makes a background thread safe
  • Takes advantage of additional cores and spare I/O bandwidth
  • Bulk loading a dTable with ~1GiB of data
  • Digest every few seconds
  • 50 seconds with background digest/combine
  • 82 seconds without
slide-73
SLIDE 73

Related Work

  • Bigtable [Chang et al. ’06]
  • Some aspects of Anvil resemble Bigtable SSTables
  • Write-optimized logs, read-optimized data
  • Higher-level distribution system complimentary
  • C-Store [Stonebraker et al. ’05]
  • Data-specific optimizations and finer control of data layout
  • Abstraction-providing libraries
  • Stasis transaction framework [Sears, Brewer ’06]
  • BerkeleyDB persistent data structure library

35

slide-74
SLIDE 74

Conclusions

36

slide-75
SLIDE 75

Conclusions

36

  • Anvil provides a new way to build storage systems
  • Desired functionality can be composed from fine-grained

dTable modules

  • Simple configuration changes allow storing data in many

different useful ways

  • Easy to write new dTables for novel storage strategies
slide-76
SLIDE 76

Conclusions

36

  • Anvil provides a new way to build storage systems
  • Desired functionality can be composed from fine-grained

dTable modules

  • Simple configuration changes allow storing data in many

different useful ways

  • Easy to write new dTables for novel storage strategies
  • Still lacks some features, but they seem compatible
  • Aborting transactions, full concurrency
slide-77
SLIDE 77

Conclusions

36

  • Anvil provides a new way to build storage systems
  • Desired functionality can be composed from fine-grained

dTable modules

  • Simple configuration changes allow storing data in many

different useful ways

  • Easy to write new dTables for novel storage strategies
  • Still lacks some features, but they seem compatible
  • Aborting transactions, full concurrency
  • Performance overhead is small compared to potential

benefits for applications

  • Prototype faster than SQLite’s B-trees for TPC-C
slide-78
SLIDE 78

More info: http://read.cs.ucla.edu/anvil