Emulating Goliath Storage Systems with David Nitin Agrawal, NEC - - PowerPoint PPT Presentation

emulating goliath storage
SMART_READER_LITE
LIVE PREVIEW

Emulating Goliath Storage Systems with David Nitin Agrawal, NEC - - PowerPoint PPT Presentation

Emulating Goliath Storage Systems with David Nitin Agrawal, NEC Labs Leo Arulraj, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau ADSL Lab, UW Madison 1 The Storage Researchers Dilemma Innovate Create the future of storage Measure


slide-1
SLIDE 1

Emulating Goliath Storage Systems with David

Nitin Agrawal, NEC Labs Leo Arulraj, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau ADSL Lab, UW Madison

1

slide-2
SLIDE 2

The Storage Researchers’ Dilemma Innovate Create the future of storage Measure Quantify improvement obtained Dilemma How to measure future of storage with devices from present?

slide-3
SLIDE 3

David: A Storage Emulator Large, fast, multiple disks using small, slow, single device

Huge Disks ~1TB disk using 80 GB disk Multiple Disks RAID of multiple disks using RAM

slide-4
SLIDE 4

Key Idea behind David

Store metadata, throw away data (and generate fake data)

Why is this OK ? Benchmarks measure performance Many benchmarks don’t care about file content Some expect valid but not exact content

slide-5
SLIDE 5

Outline Intro Overview Design Results Conclusion

slide-6
SLIDE 6

Benchmark Filesystem Backing Store Storage Model

Userspace Kernelspace

DAVID (Pseudo Block Device Driver)

Overview of how David works

slide-7
SLIDE 7

Illustrative Benchmark

Create a File Write a block of data Close the File Open file in read mode Read back the data Close the File

slide-8
SLIDE 8

Benchmark Filesystem

F = fopen(“a.txt”,”w”);

Allocate Inode in block 100

Storage Model Backing Store

How does David handle metadata write?

slide-9
SLIDE 9

Benchmark Filesystem

100 Inode block LBA : 100

Storage Model Backing Store

How does David handle metadata write?

slide-10
SLIDE 10

Benchmark Filesystem

100 100

Storage Model Backing Store

How does David handle metadata write?

slide-11
SLIDE 11

Benchmark Filesystem

100 1

Model calculates response time for write to LBA 100 Metadata block at LBA 100 is remapped to LBA 1 Storage Model Backing Store

Remap Table

100 1

How does David handle metadata write?

slide-12
SLIDE 12

Benchmark Filesystem

100 1 Response to FS after 6 ms

Storage Model Backing Store

Remap Table

100 1

How does David handle metadata write?

slide-13
SLIDE 13

Benchmark Filesystem

fwrite(buffer, 4096,1,F); 800 Data block LBA : 800

Storage Model

1

Backing Store

Remap Table

100 1

How does David handle data write?

slide-14
SLIDE 14

Benchmark Filesystem

800 800

Storage Model

1

Backing Store

Remap Table

100 1

How does David handle data write?

slide-15
SLIDE 15

Model calculates response time for write to LBA 800 Data block at LBA 800 is THROWN AWAY

800

Benchmark Filesystem

Storage Model Backing Store

1 Remap Table

100 1

How does David handle data write?

slide-16
SLIDE 16

Response to FS after 8 ms 800 Space Savings

50% Benchmark Filesystem

Storage Model Backing Store

1 Remap Table

100 1

How does David handle data write?

slide-17
SLIDE 17

Benchmark Filesystem

F = fclose(F); F = fopen(“a.txt”,”r”);

Storage Model Backing Store

1 Remap Table

100 1

How does David handle metadata read?

slide-18
SLIDE 18

Benchmark Filesystem

100 Inode block LBA : 100

Storage Model Backing Store

1 Remap Table

100 1

How does David handle metadata read?

slide-19
SLIDE 19

Benchmark Filesystem

100 100

Storage Model Backing Store

1 Remap Table

100 1

How does David handle metadata read?

slide-20
SLIDE 20

1

Model calculates response time for read to LBA 100 Block at LBA 1 is read and returned.

100 1

Benchmark Filesystem

Storage Model Backing Store

Remap Table

100 1

How does David handle metadata read?

slide-21
SLIDE 21

Benchmark Filesystem

100 1 Response to FS after 3 ms 100 1

Storage Model Backing Store

Remap Table

100 1

How does David handle metadata read?

slide-22
SLIDE 22

Benchmark Filesystem

fread(buffer, 4096,1,F); 800 Data block LBA : 800

Storage Model

1

Backing Store

Remap Table

100 1

How does David handle data read?

slide-23
SLIDE 23

Benchmark Filesystem

800 800

Storage Model Backing Store

1 Remap Table

100 1

How does David handle data read?

slide-24
SLIDE 24

Model calculates response time for read to LBA 800 Data block at LBA 800 is filled with fake content

800 800

Benchmark Filesystem

Storage Model Backing Store

1 Remap Table

100 1

How does David handle data read?

slide-25
SLIDE 25

Benchmark Filesystem

Response to FS after 8 ms 800

Storage Model Backing Store

1 Remap Table

100 1

How does David handle data read?

slide-26
SLIDE 26

Outline Intro Overview Design Results Conclusion

slide-27
SLIDE 27

Design Goals for David

Accurate Emulated disk should perform similar to real disk Scalable Should be able to emulate large disks Lightweight Emulation overhead should not affect accuracy Flexible Should be able to emulate variety of storage disks Adoptable Easy to install and use for benchmarking

slide-28
SLIDE 28

Components within David

Storage Model Block Classifier

Metadata Remapper Data Squasher Data Generator

Backing Store

slide-29
SLIDE 29

Block Classification

Data or Metadata? Distinguish data blocks from metadata blocks to throw away data blocks Why difficult? David is a block-level emulator Two Approaches

Implicit Block Classification (David automatically infers block classification) Explicit Block Classification (Operating System passes down block classification)

slide-30
SLIDE 30

Implicit Block Classification

Parse metadata writes using filesystem knowledge to infer data blocks Implementation for ext3

  • Identify inode blocks using ext3 block layout
  • Parse inode blocks to infer direct/indirect blocks
  • Parse direct/indirect blocks to infer data blocks

Problem Delay in classification

slide-31
SLIDE 31

Ext3 Ordered Journaling Mode (without David)

Journal Disk

M D

slide-32
SLIDE 32

Ext3 Ordered Journaling Mode (with David)

Journal Disk Unclassified Block Store

slide-33
SLIDE 33

Memory Pressure in Unclassified Block Store

Too many unclassified blocks exhaust memory

Technique: Journal Snooping

Parse metadata writes to journal to infer classification much earlier than usual

slide-34
SLIDE 34

Effect of Journal Snooping

500 1000 1500 2000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Memory Used (MB) Time (seconds)

Without Journal Snooping With Journal Snooping

Out of Memory

slide-35
SLIDE 35

Block Classification

Data or Metadata? Distinguish data blocks from metadata blocks to throw away data blocks Why difficult? David is a block-level emulator Two Approaches

Implicit Block Classification (David automatically infers block classification) Explicit Block Classification (Operating System passes down block classification)

slide-36
SLIDE 36

Capture page pointers to data blocks in the write system call and pass classification information to David

Benchmark Application FileSystem Data Blocks Metadata Blocks To David

Explicit Block Classification

slide-37
SLIDE 37

Block Classification Summary

Implicit Block Classification Explicit Block Classification No change to filesystem, benchmark

  • r operating system

Minimal change to

  • perating system

Requires filesystem knowledge Works for all filesystems Results with ext3 Results with btrfs

slide-38
SLIDE 38

Components within David

Storage Model Block Classifier

Metadata Remapper Data Squasher Data Generator

Backing Store

slide-39
SLIDE 39

David’s Storage Model

Filesystem

Actual System Emulated System

Storage Model

I/O request queue Benchmark Disk Filesystem David Benchmark

slide-40
SLIDE 40

I/O Queue Model

Merge sequential I/O requests

  • To improve performance

When I/O queue is empty

  • Wait for 3 ms anticipating merges

When I/O queue is full

  • Process is made to sleep and wait
  • Process is woken up once empty slots open up
  • Process is given a bonus for the wait period

I/O queue modeling critical for accuracy

slide-41
SLIDE 41

Disk Model

Simple in-kernel disk model

  • Based on Ruemmler and Wilkes disk model
  • Current models: 80GB and 1 TB Hitachi deskstar
  • Focus of our work is not disk modeling

(more accurate models are possible)

Disk model parameters

  • Disk properties

Rotational speed, head seek profile, etc.

  • Current disk state

Head position, on-disk cache state, etc.

slide-42
SLIDE 42

David’s Storage Model Accuracy

Reasonable accuracy across many workloads Many more results in paper

slide-43
SLIDE 43

Components within David

Storage Model Block Classifier

Metadata Remapper Data Squasher Data Generator

Backing Store

slide-44
SLIDE 44

Backing Store

Any physical storage can be used

  • Must be large enough to hold all metadata blocks
  • Must be fast enough to match emulated disk

Two implementations

  • Memory as backing store
  • Compressed disk as backing store

Storage space for metadata blocks

slide-45
SLIDE 45

Metadata Remapper

Remaps metadata blocks into compressed form

Inode Data Inode Data Inode Data Inode Inode Inode

Emulated Disk Compressed Disk (better performance)

slide-46
SLIDE 46

Components within David

Storage Model Block Classifier

Metadata Remapper Data Squasher Data Generator

Backing Store

slide-47
SLIDE 47

Data Squasher and Generator

Data Squasher

Throws away writes to data blocks

Data Generator

Generate content for the reads to data blocks (currently generates random content)

slide-48
SLIDE 48

Outline Intro Overview Design Results Conclusion

slide-49
SLIDE 49

Experiments

Emulation accuracy

Test emulation accuracy across benchmarks

Emulation scalability

Test space savings for large device emulation

Multiple disk emulation

Test accuracy of multiple device emulation

slide-50
SLIDE 50

Emulation Accuracy Experiment

Experimental details

Emulated ~1 TB disk with 80 GB disk Ran a variety of benchmarks Validated by using a real 1 TB disk

slide-51
SLIDE 51

Emulation Accuracy Results

(Ext3 with Implicit Block Classification)

50 100 150 200 250 300 350 400

Real Emulated Runtime (seconds)

slide-52
SLIDE 52

Emulation Accuracy Results

(Btrfs with Explicit Block Classification)

50 100 150 200 250 300 350

Real Emulated Runtime (seconds)

slide-53
SLIDE 53

Emulation Scale Experiment

Experimental details

Emulated ~1 TB disk using a 80 GB disk Created filesystem images using Impressions Validated by using a real disk

slide-54
SLIDE 54

Emulation Scale: Accuracy

slide-55
SLIDE 55

Emulation Scale: Space Savings

slide-56
SLIDE 56

Multiple Disks Experiment

Experimental details

Emulated multiple disks using RAM Measured micro-benchmark performance on RAID-1 Validated our results against real disks

slide-57
SLIDE 57

Simple RAID-1 Emulation

50 100 150 200 250 300

R/3 W/3 R/2 W/2 R/1 W/1 Runtime (seconds) Original David Random Read or Write Performance

slide-58
SLIDE 58

Outline Intro Overview Design Results Conclusion

slide-59
SLIDE 59

Conclusion

David: Emulate large devices with limited means Key idea: Throw away data Results: Accurate emulation of large and multiple disks Future: Emulating storage cluster with few machines

slide-60
SLIDE 60

Thank You

www.cs.wisc.edu/adsl

slide-61
SLIDE 61

Questions?

slide-62
SLIDE 62

Measuring Innovation Thorough measurement is Hard and Costly

Time, Money, Effort needed to measure performance on a variety

  • f storage devices

Tiny benchmarks are easy to run

slide-63
SLIDE 63

Implicit Block Classification

Unclassified block store

  • Unclassifiable blocks are temporarily stored in

Unclassified Block Store which is in RAM

  • Journal checkpoint frequency determines the

delay in classification

  • Upon classification, data blocks are squashed and

metadata blocks are persisted