Emulating Goliath Storage Systems with David Nitin Agrawal, NEC - PowerPoint PPT Presentation

Emulating Goliath Storage Systems with David Nitin Agrawal, NEC Labs Leo Arulraj, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau ADSL Lab, UW Madison 1

The Storage Researchers’ Dilemma Innovate Create the future of storage Measure Quantify improvement obtained Dilemma How to measure future of storage with devices from present?

David: A Storage Emulator Large, fast, multiple disks using small, slow, single device Huge Disks ~1TB disk using 80 GB disk Multiple Disks RAID of multiple disks using RAM

Key Idea behind David Store metadata, throw away data (and generate fake data) Why is this OK ? Benchmarks measure performance M any benchmarks don’t care about file content Some expect valid but not exact content

Outline Intro Overview Design Results Conclusion

Overview of how David works Benchmark Userspace Kernelspace Filesystem DAVID (Pseudo Block Device Driver) Storage Model Backing Store

Illustrative Benchmark Create a File Write a block of data Close the File Open file in read mode Read back the data Close the File

How does David handle metadata write? Benchmark F = fopen(“a.txt”,”w”); Allocate Inode in Filesystem block 100 Storage Model Backing Store

How does David handle metadata write? Benchmark Filesystem Inode block 100 LBA : 100 Storage Model Backing Store

How does David handle metadata write? Benchmark Filesystem 100 100 Storage Model Backing Store

How does David handle metadata write? Benchmark Filesystem Model calculates Metadata block at response time for LBA 100 is remapped write to LBA 100 to LBA 1 100 1 Remap Table 100 1 Storage Model Backing Store

How does David handle metadata write? Benchmark Filesystem Response to FS after 6 ms 100 1 Remap Table 100 1 Storage Model Backing Store

How does David handle data write? Benchmark fwrite(buffer, 4096,1,F); Filesystem Data block 800 LBA : 800 1 Remap Table 100 1 Storage Model Backing Store

How does David handle data write? Benchmark Filesystem 800 800 1 Remap Table 100 1 Storage Model Backing Store

How does David handle data write? Benchmark Filesystem Model calculates response time for Data block at LBA 800 write to LBA 800 is THROWN AWAY 800 1 Remap Table 100 1 Storage Model Backing Store

How does David handle data write? Benchmark Filesystem Response to FS after 8 ms Space Savings 50% 800 1 Remap Table 100 1 Storage Model Backing Store

How does David handle metadata read? Benchmark F = fclose(F); F = fopen(“a.txt”,”r”); Filesystem 1 Remap Table 100 1 Storage Model Backing Store

How does David handle metadata read? Benchmark Filesystem Inode block 100 LBA : 100 1 Remap Table 100 1 Storage Model Backing Store

How does David handle metadata read? Benchmark Filesystem 100 100 1 Remap Table 100 1 Storage Model Backing Store

How does David handle metadata read? Benchmark Filesystem Model calculates response time for Block at LBA 1 is read read to LBA 100 and returned. 100 1 1 Remap Table 100 1 Storage Model Backing Store

How does David handle metadata read? Benchmark Filesystem Response to FS after 3 ms 100 100 1 1 Remap Table 100 1 Storage Model Backing Store

How does David handle data read? Benchmark fread(buffer, 4096,1,F); Filesystem Data block 800 LBA : 800 1 Remap Table 100 1 Storage Model Backing Store

How does David handle data read? Benchmark Filesystem 800 800 1 Remap Table 100 1 Storage Model Backing Store

How does David handle data read? Benchmark Filesystem Model calculates Data block at LBA 800 response time for is filled with fake read to LBA 800 content 800 800 1 Remap Table 100 1 Backing Store Storage Model

How does David handle data read? Benchmark Filesystem Response to FS after 8 ms 800 1 Remap Table 100 1 Backing Store Storage Model

Design Goals for David Accurate Emulated disk should perform similar to real disk Scalable Should be able to emulate large disks Lightweight Emulation overhead should not affect accuracy Flexible Should be able to emulate variety of storage disks Adoptable Easy to install and use for benchmarking

Components within David Block Classifier Data Data Metadata Storage Generator Squasher Remapper Model Backing Store

Block Classification Data or Metadata? Distinguish data blocks from metadata blocks to throw away data blocks Why difficult? David is a block-level emulator Two Approaches Implicit Block Explicit Block Classification Classification (David automatically (Operating System infers block passes down block classification) classification)

Implicit Block Classification Parse metadata writes using filesystem knowledge to infer data blocks Implementation for ext3 • Identify inode blocks using ext3 block layout • Parse inode blocks to infer direct/indirect blocks • Parse direct/indirect blocks to infer data blocks Problem Delay in classification

Ext3 Ordered Journaling Mode (without David) M D Journal Disk

Ext3 Ordered Journaling Mode (with David) Unclassified Block Store Journal Disk

Memory Pressure in Unclassified Block Store Too many unclassified blocks exhaust memory Technique: Journal Snooping Parse metadata writes to journal to infer classification much earlier than usual

Effect of Journal Snooping Without Journal Snooping With Journal Snooping Out of Memory 2000 Memory Used 1500 (MB) 1000 500 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Time (seconds)

Block Classification Data or Metadata? Distinguish data blocks from metadata blocks to throw away data blocks Why difficult? David is a block-level emulator Two Approaches Explicit Block Implicit Block Classification Classification (Operating System (David automatically passes down block infers block classification) classification)

Explicit Block Classification Benchmark Application Data Blocks Metadata Blocks FileSystem To David Capture page pointers to data blocks in the write system call and pass classification information to David

Block Classification Summary Implicit Block Explicit Block Classification Classification No change to Minimal change to filesystem, benchmark operating system or operating system Requires filesystem Works for all knowledge filesystems Results with ext3 Results with btrfs

David’s Storage Model Emulated System Actual System Benchmark Benchmark Filesystem Filesystem I/O request Storage queue David Model Disk

I/O Queue Model Merge sequential I/O requests • To improve performance When I/O queue is empty • Wait for 3 ms anticipating merges When I/O queue is full • Process is made to sleep and wait • Process is woken up once empty slots open up • Process is given a bonus for the wait period I/O queue modeling critical for accuracy

Disk Model Simple in-kernel disk model • Based on Ruemmler and Wilkes disk model • Current models: 80GB and 1 TB Hitachi deskstar • Focus of our work is not disk modeling (more accurate models are possible) Disk model parameters • Disk properties Rotational speed, head seek profile, etc. • Current disk state Head position, on-disk cache state, etc.

David’s Storage Model Accuracy Reasonable accuracy across many workloads Many more results in paper

Backing Store Storage space for metadata blocks Any physical storage can be used • Must be large enough to hold all metadata blocks • Must be fast enough to match emulated disk Two implementations • Memory as backing store • Compressed disk as backing store

Metadata Remapper Remaps metadata blocks into compressed form Emulated Disk Inode Data Inode Data Inode Data Inode Inode Inode Compressed Disk (better performance)

Data Squasher and Generator Data Squasher Throws away writes to data blocks Data Generator Generate content for the reads to data blocks (currently generates random content)

Experiments Emulation accuracy Test emulation accuracy across benchmarks Emulation scalability Test space savings for large device emulation Multiple disk emulation Test accuracy of multiple device emulation

Emulation Accuracy Experiment Experimental details Emulated ~1 TB disk with 80 GB disk Ran a variety of benchmarks Validated by using a real 1 TB disk

Emulation Accuracy Results (Ext3 with Implicit Block Classification) 400 Runtime (seconds) 350 300 250 200 Real 150 100 Emulated 50 0

Emulation Accuracy Results (Btrfs with Explicit Block Classification) 350 Runtime (seconds) 300 250 200 150 Real 100 Emulated 50 0

Emulating Goliath Storage Systems with David Nitin Agrawal, NEC - PowerPoint PPT Presentation

Emulating Goliath Storage Systems with David Nitin Agrawal, NEC Labs Leo Arulraj, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau ADSL Lab, UW Madison 1 The Storage Researchers Dilemma Innovate Create the future of storage Measure

7/16/2018 David & Goliath What can the story of David and Goliath teach us about becoming

D T NOR GOLIATH DP3 IMR / Construction Vessel L Private and confidential and subject to the

> SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE

Golden Goliath Resources TSX: GNG.V www.goldengoliath.com Hunting Giants in the Land of Giants

Goliath Grouper Public Workshops August and October, 2017 Florida Fish and Wildlife Conservation

Goliath grouper management stakeholder project Kai Lorenzen Kai Lorenzen, Jessica Sutt, Joy ,

Bringing the Benefits of David to Goliath: Special Economic Zones and Institutional Improvement

TAKING ON GOLIATH using drupal in schools and non-profits Jason Pamental, Platform Architect

David versus Goliath:Small Cells versus Massive MIMO Jakob Hoydis and Mrouane Debbah 1948:

David vs Goliath a battle with an oligopoly *Next-generation Blockchain-based distribution for

SUSE Enterprise Storage 6 Darren Soothill EMEA Storage Technical Strategist Agenda

Solar Plus Storage Solar Plus Storage Focus on Storage Benefits Focus on Storage Benefits by

Hybrid SAN & Cluster Enterprise Network Storage Hikvision Enterprise Network Storage

INF5470 Fall 2012 Lecture 10: Analog Storage Content Overview Volatile Short Term Storage

Physics becomes the computer Norm Margolus Physics becomes the computer Emulating Physics

BREAKING AND ENTERING: EMULATING THE DIGITAL ADVERSARY IN 2019 Bobby Thompson National

The Quest for Average Response Time Tom Henzinger IST Austria Joint work with Krishnendu

BlinkDB: Queries with Bounded Error and Bounded Response Times on Very Large Data Sameer Agarwal,

Welcome! Office Hours will start at 2pm and run until 3pm Please mute your microphone to cut

Operational Compliance Report March 2020 Frank Gresh Chief Information Officer To serve our

Joint Probability Distributions In many random experiments, more than one quantity is measured,

PET Semetary: Privacys return from the dead and the Future of Privacy Engineering Seda

Control-flow analysis ? ? ? ? ? Discovering information about how control (e.g. the program

15-411: LLVM Jan Ho ff mann Substantial portions courtesy of Deby Katz and Gennady Pekhimenko,

Emulating Goliath Storage Systems with David Nitin Agrawal, NEC - PowerPoint PPT Presentation

Emulating Goliath Storage Systems with David Nitin Agrawal, NEC Labs Leo Arulraj, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau ADSL Lab, UW Madison 1 The Storage Researchers Dilemma Innovate Create the future of storage Measure

7/16/2018 David &amp; Goliath What can the story of David and Goliath teach us about becoming

D T NOR GOLIATH DP3 IMR / Construction Vessel L Private and confidential and subject to the

&gt; SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE

Golden Goliath Resources TSX: GNG.V www.goldengoliath.com Hunting Giants in the Land of Giants

Goliath Grouper Public Workshops August and October, 2017 Florida Fish and Wildlife Conservation

Goliath grouper management stakeholder project Kai Lorenzen Kai Lorenzen, Jessica Sutt, Joy ,

Bringing the Benefits of David to Goliath: Special Economic Zones and Institutional Improvement

TAKING ON GOLIATH using drupal in schools and non-profits Jason Pamental, Platform Architect

David versus Goliath:Small Cells versus Massive MIMO Jakob Hoydis and Mrouane Debbah 1948:

David vs Goliath a battle with an oligopoly *Next-generation Blockchain-based distribution for

SUSE Enterprise Storage 6 Darren Soothill EMEA Storage Technical Strategist Agenda

Solar Plus Storage Solar Plus Storage Focus on Storage Benefits Focus on Storage Benefits by

Hybrid SAN &amp; Cluster Enterprise Network Storage Hikvision Enterprise Network Storage

INF5470 Fall 2012 Lecture 10: Analog Storage Content Overview Volatile Short Term Storage

Physics becomes the computer Norm Margolus Physics becomes the computer Emulating Physics

BREAKING AND ENTERING: EMULATING THE DIGITAL ADVERSARY IN 2019 Bobby Thompson National

The Quest for Average Response Time Tom Henzinger IST Austria Joint work with Krishnendu

BlinkDB: Queries with Bounded Error and Bounded Response Times on Very Large Data Sameer Agarwal,

Welcome! Office Hours will start at 2pm and run until 3pm Please mute your microphone to cut

Operational Compliance Report March 2020 Frank Gresh Chief Information Officer To serve our

Joint Probability Distributions In many random experiments, more than one quantity is measured,

PET Semetary: Privacys return from the dead and the Future of Privacy Engineering Seda

Control-flow analysis ? ? ? ? ? Discovering information about how control (e.g. the program

15-411: LLVM Jan Ho ff mann Substantial portions courtesy of Deby Katz and Gennady Pekhimenko,

7/16/2018 David & Goliath What can the story of David and Goliath teach us about becoming

> SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE

Hybrid SAN & Cluster Enterprise Network Storage Hikvision Enterprise Network Storage