Optimizing Flash Allocation to Workloads in Google's Colossus File - - PowerPoint PPT Presentation

optimizing flash allocation to workloads in google s
SMART_READER_LITE
LIVE PREVIEW

Optimizing Flash Allocation to Workloads in Google's Colossus File - - PowerPoint PPT Presentation

Optimizing Flash Allocation to Workloads in Google's Colossus File System Christoph Albrecht, Arif Merchant , Murray Stokely, Muhammad Waliji, Francois Labelle, Nate Coehlo, Xudong Shi, C. Eric Schrock Google Storage Analytics Team Model-driven


slide-1
SLIDE 1

Optimizing Flash Allocation to Workloads in Google's Colossus File System

Christoph Albrecht, Arif Merchant, Murray Stokely, Muhammad Waliji, Francois Labelle, Nate Coehlo, Xudong Shi, C. Eric Schrock Google Storage Analytics Team

Model-driven algorithms and architectures for self-aware computing systems Dagstuhl seminar 15041, January 2015

slide-2
SLIDE 2

Motivation: Trend of HDD and SSD

  • Disk drives (HDD) are slow, and are getting larger but not faster.
  • Flash (SSD) offers much higher I/O rate, but is expensive.

Capacity IOPS HDD SSD IOPS and capacity of SSD and HDD of equal cost

slide-3
SLIDE 3

Workloads

  • Thousands of users and applications (indexing, ad serving, email,

video processing, ...).

  • Many component jobs for one application with often separate data.
slide-4
SLIDE 4

New files Flash Disk FIFO

Q1 Q2a Q2b

How long? How much flash? Flash or disk?

Janus System (Flash tiering): Insertion on Write, Approximate FIFO

slide-5
SLIDE 5

Janus System: Offline Analysis and Optimization

Input data collection

Q1 Q2a Q2b

Statistic (by workload): Age of bytes stored Statistic (by workload): Age of data accessed Global Optimization Flash or disk Amount of flash Time in flash (TTL) Cacheability Functions (Hit Rate Curve) characterization of each workload

(Sampled RPC analysis) (Scan of metadata)

slide-6
SLIDE 6

Constructing the Cacheability Function

For a given amount of flash how many read operations can be absorbed from flash if we store the youngest data in flash?

slide-7
SLIDE 7

Cacheability Function

Most of the read operations go to the very young data using only a small fraction of the total data size.

slide-8
SLIDE 8

Instance:

  • Workloads with cacheability functions
  • Total flash capacity
  • Bound on write rate

Task:

  • Allocate flash to workloads to maximize the weighted

flash read rate. Solution method:

  • Lagrangian relaxation + Linear programming (assuming

concave and piecewise linear cacheability function) Optimizing the Flash Allocation for Workloads

slide-9
SLIDE 9

Allocation Method Cell A (low workload variance) Cell B (high workload variance) Optimized 28% 74% Proportional to read rate 26% 64% Single FIFO 19% 42% Proportional to data size 14% 15%

47% 76%

Flash Hit Rate

Does it work? Comparing alternative methods

slide-10
SLIDE 10

Take away messages

  • Specific: Flash is cost-effective for cloud-scale storage if

used selectively

  • Broader: It is feasible to use large-scale historical trace

data for automated on-line configuration