Storage Fabric CS6453 Summary Last week: NVRAM is going to change - - PowerPoint PPT Presentation

storage fabric
SMART_READER_LITE
LIVE PREVIEW

Storage Fabric CS6453 Summary Last week: NVRAM is going to change - - PowerPoint PPT Presentation

Storage Fabric CS6453 Summary Last week: NVRAM is going to change the way we thing about storage. Today: Challenges of storage layers (SSDs, HDs) that are created from massive data. Slowdowns in HDs and SSDs. Enforcing policies


slide-1
SLIDE 1

Storage Fabric

CS6453

slide-2
SLIDE 2

Summary

 Last week: NVRAM is going to change the way we thing about storage.  Today: Challenges of storage layers (SSDs, HDs) that are created from massive

data.

 Slowdowns in HDs and SSDs.  Enforcing policies for IO operations in Cloud architectures.

slide-3
SLIDE 3

Background: Storage for Big Data

 One disk is not enough to handle massive amounts of data.  Last time: Efficient datacenter networks using large number of cheap

commodity switches.

 Solution: Efficient IO performance using large number of commodity storage

devices.

slide-4
SLIDE 4

Background: RAIDS

 Achieves Nx performance where

N is the number of Disks.

 Is this for free?

 When N becomes large then the

probability of Disk failures becomes large as well.

 RAID 0 does not tolerate

failures.

slide-5
SLIDE 5

Background: RAIDS

 Achieves (K-1)-fault tolerance

with Kx Disks.

 Is this for free?

 There are Kx more disks (e.g. if

you want to tolerate 1 failure you need 2x more Disks than RAID 0).

 RAID 1 does not utilize resources

in an efficient way.

slide-6
SLIDE 6

Background: Erasure Code

 Achieves K-fault tolerance with

N+K Disks.

 Efficient utilization of Disks (not

as great as RAID 0).

 Fault-Tolerance (not as great as

RAID 1).

 Is this for free?

 Reconstruction Cost : # of Disks

needed from a read in case of failure(s).

 RAID 6 has a Reconstruction Cost

  • f 3.
slide-7
SLIDE 7

Modern Erasure Code Techniques

 Erasure Coding in Windows Azure Storage [Huang, 2012]

 Exploit Point:

𝑄𝑠𝑝𝑐 1 𝑔𝑏𝑗𝑚𝑣𝑠𝑓 ≫ 𝑄𝑠𝑝𝑐[2 𝑔𝑏𝑗𝑚𝑣𝑠𝑓𝑡 𝑝𝑠 𝑛𝑝𝑠𝑓]

 Solution: Construct Erasure Code Technique that has low reconstruction cost for 1

failure.

 1.33x more storage overhead (relatively low).  Tolerate up to 3 failures in 16 storage devices.  Reconstruction cost of 6 for 1 failure and 12 for 2+ failures.

slide-8
SLIDE 8

The Tail at Store: Problem

 We have seen how we treat failures with reconstruction. What about

slowdowns in HDs (or SSDs)?

 A slowdown of a disk (no failures) might have significant impact at overall

performance.

 Questions:

 Do HDs or SSDs exhibit transient slowdowns?  Are slowdowns of disks frequent enough to affect the overall performance?  What causes slowdowns?  How do we deal with slowdowns?

slide-9
SLIDE 9

The Tail at Store: Study

RAID

D P Q

Disk SSD #RAID groups 38,029 572 #Data drives per group 3-26 3-22 #Data drives 458,482 4,069 Total drive hours 857,183,442 7,481,055 Total RAID hours 72,046,373 1,072,690

D … D

slide-10
SLIDE 10

0.9 0.92 0.94 0.96 0.98 1 1x 2x 4x 8x

Slowdown CDF of Slowdown (Disk)

Si T

The Tail at Store: Slowdowns?

Hourly average I/O latency per drive 𝑀

Slowdown: 𝑇 = 𝑀 𝑀𝑛𝑓𝑒𝑗𝑏𝑜

Tail: T = 𝑇𝑛𝑏𝑦

Slow Disks: S ≥ 2

𝑇 ≥ 2 at 99.8 percentile

𝑇 ≥ 1.5 at 99.3 percentile

𝑈 ≥ 2 at 97.8 percentile

𝑈 ≥ 1.5 at 95.2 percentile

SSDs exhibit even more slowdowns

slide-11
SLIDE 11

0.2 0.4 0.6 0.8 1 1 2 4 8 16 32 64 128 256 Slowdown Interval (Hours) CDF of Slowdown Interval Disk SSD

The Tail at Store: Duration?

Slowdowns are transient

40% of HD slowdowns ≥2 hours

12% of HD slowdowns ≥ 10 hours

Many slowdowns happen in consecutive hours (last more)

slide-12
SLIDE 12

0.5 0.6 0.7 0.8 0.9 1 5 10 15 20 25 30 35 Inter-Arrival between Slowdowns (Hours) CDF of Slowdown Inter-Arrival Period Disk SSD

The Tail at Store: Correlation between slowdowns in the same storage?

90% of Disk slowdown are within 24 hours of another slowdown of the same Disk.

> 80% of SSDs slowdown are within 24 hours of another slowdown of the same SSD.

Slowdowns happen in the same Disks relatively close to each other.

slide-13
SLIDE 13

0.2 0.4 0.6 0.8 1 0.5x 1x 2x 4x Rate Imbalance CDF of RI within Si >= 2 Disk SSD

The Tail at Store: Causes?

𝑆𝐽 =

𝐽/𝑃𝑆𝑏𝑢𝑓 𝐽/𝑃𝑆𝑏𝑢𝑓𝑛𝑓𝑒𝑗𝑏𝑜

Rate imbalance does not seem to be the main cause

  • f slowdowns for slow

Disks.

slide-14
SLIDE 14

0.2 0.4 0.6 0.8 1 0.5x 1x 2x 4x Size Imbalance CDF of ZI within Si >= 2 Disk SSD

The Tail at Store: Causes?

𝑇𝐽 =

𝐽/𝑃𝑇𝑗𝑨𝑓 𝐽/𝑃𝑇𝑗𝑨𝑓𝑛𝑓𝑒𝑗𝑏𝑜

Size imbalance does not seem to be the main cause

  • f slowdowns for slow

Disks.

slide-15
SLIDE 15

0.95 0.96 0.97 0.98 0.99 1 1x 2x 3x 4x 5x Slowdown CDF of Slowdown vs. Drive Age (Disk)

9 1 2 3 4 5 7 6 10 8

The Tail at Store: Causes?

Disk age seems to have some correlation but it is not strongly correlated.

slide-16
SLIDE 16

The Tail at Store: Causes?

No correlation of slowdowns to time of the day (0am – 24pm)

No explicit drive events around slow hours

Unplugging disks and plugging them back does not particularly help

SSD vendors have significant differences between them

slide-17
SLIDE 17

The Tail at Store: Solutions

Create Tail-Tolerant RAIDS.

Treat slow disks as failed disks.

Reactive

 Detect slow Disks: take a lot of time to answer (>2x from other Disks).  Reconstruct answer from other disks using RAID redundancy if Disk is slow.  Latency is going to optimally be around 3x compared to a read from an average Disk.

Proactive

 Always use RAID redundancy for additional read.  Take fastest answer.  Uses much more I/O bandwidth.

Adaptive

 Combination of both approaches taking into account the findings.  Use reactive approach until a slowdown is detected.  After this use proactive approach since slowdowns are repetitive and last many hours.

slide-18
SLIDE 18

The Tail at Store: Conclusions

 More research on possible causes for Disk and SSD slowdowns is required  Need Tail-Tolerant RAIDS to reduce the overhead from slowdowns

 Since reconstruction of data is the way to deal with slowdowns and if

𝑄𝑠𝑝𝑐 1 𝑡𝑚𝑝𝑥𝑒𝑝𝑥𝑜 ≫ 𝑄𝑠𝑝𝑐[2 𝑡𝑚𝑝𝑥𝑒𝑝𝑥𝑜 𝑝𝑠 𝑛𝑝𝑠𝑓] the Azure paper [Huang, 2012] becomes more relevant.

slide-19
SLIDE 19

Background: Cloud Storage

 General Purpose Applications  Separate VM-VM connections from VM-

Storage connections

 Storage is virtualized

 Many layers from application to actual storage

 Resources are shared across multiple tenants

slide-20
SLIDE 20

IOFlow: Problem

 Cannot support end-to-end policies (e.g.

minimum IO bandwidth from application to storage)

 Applications do not have any way of

expressing their storage policies

 Sharing infrastructure where aggressive

applications tend to get more IO bandwidth

slide-21
SLIDE 21

IOFlow: Challenges

 No existing enforcing mechanism for

controlling IO rates

 Aggregate performance policies  Non-performance policies  Admission control  Dynamic enforcement  Support for unmodified applications and VMs

slide-22
SLIDE 22

IOFlow: Do it like SDNs

slide-23
SLIDE 23

IOFlow: Supported policies

 <VM, Destination> -> Bandwidth

(static, compute side)

 <VM, Destination> -> Min Bandwidth

(dynamic, compute side)

 <VM, Destination> -> Sanitize

(static, compute or storage side)

 <VM, Destination> -> Priority Level

(static, compute and storage side)

 <Set of VMs, Set of Destinations> -> Bandwidth (dynamic, compute side)

slide-24
SLIDE 24

Example 1: Interface

 Policies:

 <VM1,Server X> -> B1  <VM2,Server X> -> B2

 Controller to SMBc of physical server containing VM1 and VM2

 createQueueRule(<VM1,Server X>,Q1)  createQueueRule(<VM2,Server X>,Q2)  createQueueRule(<*,*>,Q0)  configureQueueService(Q1, <B1, low, S>), where S is the size of the queue  configureQueueService(Q2, <B2, low, S>)  configureQueueService(Q0, <C-B1-B2, low, S>), where C is the Capacity of Server X.

slide-25
SLIDE 25

Example 2: Max-Min Fairness

 Policies:

 <VM1-VM3,Server X> -> 900 Mbps

 Demand:

 VM1 -> 600 Mbps  VM2 -> 400 Mbps  VM3 -> 200 Mbps

 Result:

 VM1 -> 350 Mbps  VM2 -> 350 Mbps  VM3 -> 200 Mbps

slide-26
SLIDE 26

IOFlow: Evaluation of Policy Enforcement

 Windows-based IO stack  10 hypervisors with 12 VMs each (120 VMs total)  4 tenants using 30 VMs each (3 VMs per hypervisor for each tenant)  1 Storage Server

 6.4 Gbps IO Bandwidth

 1 Controller

 1s interval between dynamic enforcements of policies

slide-27
SLIDE 27

IOFlow: Evaluation of Policy Enforcement

Tenant Policy Index {VM 1 -30, X} -> Min 800 Mbps Data {VM 31 - 60, X} -> Min 800 Mbps Message {VM 61 -90, X} -> Min 2500 Mbps Log {VM 91 -120, X} -> Min 1500 Mbps

slide-28
SLIDE 28

IOFlow: Evaluation of Policy Enforcement

slide-29
SLIDE 29

IOFlow: Evaluation of Overhead

slide-30
SLIDE 30

IOFlow: Conclusions

 Contributions

 First Software Defined Storage approach  Fine-grain control over the IO operations in Cloud

 Limitations

 Network or other resources might be the bottleneck

 Need to care about locating the VMs (spatial locality) close to data  Flat Datacenter Storage [Nightingale, 2012] provides solutions for this problem

 Guaranteed latencies are not expressed by current policies

 Best effort approach by setting priority

slide-31
SLIDE 31

Specialized Storage Architectures

 HDFS [Shvachko, 2009] and GFS [Ghemawat, 2003] work well for Hadoop

MapReduce applications.

 Facebook’s Photo Storage [Beaver, 2010] exploits workload characteristics to

design and implement better storage system.