storage fabric
play

Storage Fabric CS6453 Summary Last week: NVRAM is going to change - PowerPoint PPT Presentation

Storage Fabric CS6453 Summary Last week: NVRAM is going to change the way we thing about storage. Today: Challenges of storage layers (SSDs, HDs) that are created from massive data. Slowdowns in HDs and SSDs. Enforcing policies


  1. Storage Fabric CS6453

  2. Summary  Last week: NVRAM is going to change the way we thing about storage.  Today: Challenges of storage layers (SSDs, HDs) that are created from massive data.  Slowdowns in HDs and SSDs.  Enforcing policies for IO operations in Cloud architectures.

  3. Background: Storage for Big Data  One disk is not enough to handle massive amounts of data.  Last time: Efficient datacenter networks using large number of cheap commodity switches.  Solution: Efficient IO performance using large number of commodity storage devices.

  4. Background: RAIDS  Achieves Nx performance where N is the number of Disks.  Is this for free?  When N becomes large then the probability of Disk failures becomes large as well.  RAID 0 does not tolerate failures.

  5. Background: RAIDS  Achieves (K-1)-fault tolerance with Kx Disks.  Is this for free?  There are Kx more disks (e.g. if you want to tolerate 1 failure you need 2x more Disks than RAID 0).  RAID 1 does not utilize resources in an efficient way.

  6. Background: Erasure Code  Achieves K-fault tolerance with N+K Disks.  Efficient utilization of Disks (not as great as RAID 0).  Fault-Tolerance (not as great as RAID 1).  Is this for free?  Reconstruction Cost : # of Disks needed from a read in case of failure(s).  RAID 6 has a Reconstruction Cost of 3.

  7. Modern Erasure Code Techniques  Erasure Coding in Windows Azure Storage [Huang, 2012]  Exploit Point: 𝑄𝑠𝑝𝑐 1 𝑔𝑏𝑗𝑚𝑣𝑠𝑓 ≫ 𝑄𝑠𝑝𝑐[2 𝑔𝑏𝑗𝑚𝑣𝑠𝑓𝑡 𝑝𝑠 𝑛𝑝𝑠𝑓]  Solution: Construct Erasure Code Technique that has low reconstruction cost for 1 failure.  1.33x more storage overhead (relatively low).  Tolerate up to 3 failures in 16 storage devices.  Reconstruction cost of 6 for 1 failure and 12 for 2+ failures.

  8. The Tail at Store: Problem  We have seen how we treat failures with reconstruction. What about slowdowns in HDs (or SSDs)?  A slowdown of a disk (no failures) might have significant impact at overall performance.  Questions:  Do HDs or SSDs exhibit transient slowdowns?  Are slowdowns of disks frequent enough to affect the overall performance?  What causes slowdowns?  How do we deal with slowdowns?

  9. The Tail at Store: Study RAID D D … D Q P Disk SSD #RAID groups 38,029 572 #Data drives per group 3-26 3-22 #Data drives 458,482 4,069 Total drive hours 857,183,442 7,481,055 Total RAID hours 72,046,373 1,072,690

  10. The Tail at Store: Slowdowns? Hourly average I/O latency  CDF of Slowdown (Disk) per drive 𝑀 1 Slowdown:  𝑀 𝑇 = 0.98 𝑀 𝑛𝑓𝑒𝑗𝑏𝑜 Tail:  T = 𝑇 𝑛𝑏𝑦 0.96 Slow Disks: S ≥ 2  0.94 𝑇 ≥ 2 at 99.8 percentile  𝑇 ≥ 1.5 at 99.3 percentile  0.92 Si 𝑈 ≥ 2 at 97.8 percentile  T 𝑈 ≥ 1.5 at 95.2 percentile 0.9  1x 2x 4x 8x SSDs exhibit even more  slowdowns Slowdown

  11. The Tail at Store: Duration? Slowdowns are transient  CDF of Slowdown Interval 40% of HD slowdowns ≥ 2  1 hours 12% of HD slowdowns ≥ 10  0.8 hours 0.6 Many slowdowns happen in  consecutive hours (last more) 0.4 0.2 Disk SSD 0 1 2 4 8 16 32 64 128 256 Slowdown Interval (Hours)

  12. The Tail at Store: Correlation between slowdowns in the same storage? 90% of Disk slowdown are  CDF of Slowdown Inter-Arrival Period within 24 hours of another slowdown of the same Disk. 1 > 80% of SSDs slowdown  0.9 are within 24 hours of another slowdown of the same SSD. 0.8 Slowdowns happen in the  0.7 same Disks relatively close to each other. 0.6 Disk SSD 0.5 0 5 10 15 20 25 30 35 Inter-Arrival between Slowdowns (Hours)

  13. The Tail at Store: Causes? 𝐽/𝑃𝑆𝑏𝑢𝑓 𝑆𝐽 =  CDF of RI within Si >= 2 𝐽/𝑃𝑆𝑏𝑢𝑓 𝑛𝑓𝑒𝑗𝑏𝑜 1 Rate imbalance does not  seem to be the main cause of slowdowns for slow 0.8 Disks. 0.6 0.4 0.2 Disk SSD 0 0.5x 1x 2x 4x Rate Imbalance

  14. The Tail at Store: Causes? 𝐽/𝑃𝑇𝑗𝑨𝑓 𝑇𝐽 =  CDF of ZI within Si >= 2 𝐽/𝑃𝑇𝑗𝑨𝑓 𝑛𝑓𝑒𝑗𝑏𝑜 1 Size imbalance does not  seem to be the main cause of slowdowns for slow 0.8 Disks. 0.6 0.4 0.2 Disk SSD 0 0.5x 1x 2x 4x Size Imbalance

  15. The Tail at Store: Causes? Disk age seems to have  CDF of Slowdown vs. Drive Age (Disk) some correlation but it is not strongly correlated. 1 0.99 9 1 0.98 2 3 4 0.97 5 7 0.96 6 10 8 0.95 1x 2x 3x 4x 5x Slowdown

  16. The Tail at Store: Causes? No correlation of slowdowns to time of the day (0am – 24pm)  No explicit drive events around slow hours  Unplugging disks and plugging them back does not particularly help  SSD vendors have significant differences between them 

  17. The Tail at Store: Solutions Create Tail-Tolerant RAIDS.  Treat slow disks as failed disks.  Reactive   Detect slow Disks: take a lot of time to answer (>2x from other Disks).  Reconstruct answer from other disks using RAID redundancy if Disk is slow.  Latency is going to optimally be around 3x compared to a read from an average Disk. Proactive   Always use RAID redundancy for additional read.  Take fastest answer.  Uses much more I/O bandwidth. Adaptive   Combination of both approaches taking into account the findings.  Use reactive approach until a slowdown is detected.  After this use proactive approach since slowdowns are repetitive and last many hours.

  18. The Tail at Store: Conclusions  More research on possible causes for Disk and SSD slowdowns is required  Need Tail-Tolerant RAIDS to reduce the overhead from slowdowns  Since reconstruction of data is the way to deal with slowdowns and if 𝑄𝑠𝑝𝑐 1 𝑡𝑚𝑝𝑥𝑒𝑝𝑥𝑜 ≫ 𝑄𝑠𝑝𝑐[2 𝑡𝑚𝑝𝑥𝑒𝑝𝑥𝑜 𝑝𝑠 𝑛𝑝𝑠𝑓] the Azure paper [Huang, 2012] becomes more relevant.

  19. Background: Cloud Storage  General Purpose Applications  Separate VM-VM connections from VM- Storage connections  Storage is virtualized  Many layers from application to actual storage  Resources are shared across multiple tenants

  20. IOFlow: Problem  Cannot support end-to-end policies (e.g. minimum IO bandwidth from application to storage)  Applications do not have any way of expressing their storage policies  Sharing infrastructure where aggressive applications tend to get more IO bandwidth

  21. IOFlow: Challenges  No existing enforcing mechanism for controlling IO rates  Aggregate performance policies  Non-performance policies  Admission control  Dynamic enforcement  Support for unmodified applications and VMs

  22. IOFlow: Do it like SDNs

  23. IOFlow: Supported policies  <VM, Destination> -> Bandwidth (static, compute side)  <VM, Destination> -> Min Bandwidth (dynamic, compute side)  <VM, Destination> -> Sanitize (static, compute or storage side)  <VM, Destination> -> Priority Level (static, compute and storage side)  <Set of VMs, Set of Destinations> -> Bandwidth (dynamic, compute side)

  24. Example 1: Interface  Policies:  <VM1,Server X> -> B1  <VM2,Server X> -> B2  Controller to SMBc of physical server containing VM1 and VM2  createQueueRule(<VM1,Server X>,Q1)  createQueueRule(<VM2,Server X>,Q2)  createQueueRule(<*,*>,Q0)  configureQueueService(Q1, <B1, low, S>), where S is the size of the queue  configureQueueService(Q2, <B2, low, S>)  configureQueueService(Q0, <C-B1-B2, low, S>), where C is the Capacity of Server X.

  25. Example 2: Max-Min Fairness  Policies:  <VM1-VM3,Server X> -> 900 Mbps  Demand:  VM1 -> 600 Mbps  VM2 -> 400 Mbps  VM3 -> 200 Mbps  Result:  VM1 -> 350 Mbps  VM2 -> 350 Mbps  VM3 -> 200 Mbps

  26. IOFlow: Evaluation of Policy Enforcement  Windows-based IO stack  10 hypervisors with 12 VMs each (120 VMs total)  4 tenants using 30 VMs each (3 VMs per hypervisor for each tenant)  1 Storage Server  6.4 Gbps IO Bandwidth  1 Controller  1s interval between dynamic enforcements of policies

  27. IOFlow: Evaluation of Policy Enforcement Tenant Policy Index {VM 1 -30, X} -> Min 800 Mbps Data {VM 31 - 60, X} -> Min 800 Mbps Message {VM 61 -90, X} -> Min 2500 Mbps Log {VM 91 -120, X} -> Min 1500 Mbps

  28. IOFlow: Evaluation of Policy Enforcement

  29. IOFlow: Evaluation of Overhead

  30. IOFlow: Conclusions  Contributions  First Software Defined Storage approach  Fine-grain control over the IO operations in Cloud  Limitations  Network or other resources might be the bottleneck  Need to care about locating the VMs (spatial locality) close to data  Flat Datacenter Storage [Nightingale, 2012] provides solutions for this problem  Guaranteed latencies are not expressed by current policies  Best effort approach by setting priority

  31. Specialized Storage Architectures  HDFS [Shvachko, 2009] and GFS [Ghemawat, 2003] work well for Hadoop MapReduce applications.  Facebook’s Photo Storage [Beaver, 2010] exploits workload characteristics to design and implement better storage system.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend