reliability hierarchies
play

Reliability Hierarchies Peter M. Chen David E. Lowell Computer - PowerPoint PPT Presentation

Reliability Hierarchies Peter M. Chen David E. Lowell Computer Science and Engineering Division Electrical Engineering and Computer Science University of Michigan Performance Hierarchies better L1 cache performance L2 cache main memory


  1. Reliability Hierarchies Peter M. Chen David E. Lowell Computer Science and Engineering Division Electrical Engineering and Computer Science University of Michigan

  2. Performance Hierarchies better L1 cache performance L2 cache main memory better cost swap disk

  3. Reliability Hierarchies better overhead memory performance cost power disk on-site tape backup better reliability remote backup

  4. Write-Back Policy When to transfer data to lower, more reliable level? Write-through • most reliable • effectively eliminates upper level from reliability hierarchy Delayed-write • e.g. write new data to memory, then transfer to disk after 15 seconds • trade-off between reliability and overhead (e.g. performance)

  5. Metrics for a Reliability Hierarchy Mean time to data loss (MTTDL) • limited by reliability of highest (least reliable) level • doesn’t distinguish between degrees of data loss Data loss rate • fraction of new data lost over time ∑ data lossL - - - - - - - - - - - - - - - - - - - - - - - - - MTTFL all levels

  6. Example Faults and Storage Levels Storage Levels Affected by Fault Fault Example CPU/ on-site remote Category MTTF disk RAID memory backup backup operating system 2 months ✔ file system 5 years ✔ ✔ ✔ power 10 years (UPS) ✔ motherboard 5 years ✔ media 5 years ✔ catastrophe 50 years ✔ ✔ ✔ ✔

  7. Analysis of Michigan Server MTTF = 0.15 years memory 15 seconds disk MTTF = 2.4 years 1 day on-site tape backup MTTF = 50 years overall MTTDL = 0.15 years data loss rate = 10 hours/year

  8. Rio on PCs New level in the storage hierarchy: reliable main memory Enable memory to survive operating system crashes crash starts crash finishes protect memory safe sync

  9. Example Faults and Storage Levels Storage Levels Affected by Fault Fault Example CPU/memory CPU/memory Category MTTF disk with Rio operating system 2 months ✔ file system 5 years ✔ ✔ ✔ power 10 years (UPS) ✔ ✔ motherboard 5 years ✔ ✔ media 5 years ✔ catastrophe 50 years ✔ ✔ ✔

  10. Rio’s Effect on Reliability MTTF = 0.15 years ➜ 1.9 years memory MTTF = 2.4 years disk MTTF = 50 years on-site tape backup overall MTTDL = 0.15 years ➜ 1.4 years data loss rate = 10 hours/year

  11. Conclusions Two views of hierarchies • trade-off between cost and performance • trade-off between reliability and performance/ cost/power/etc. Rio fills in the “reliability gap” between memory and disk • hypothesis: can use Rio to store new types of data that would like higher reliability than memory but can’t afford overhead of disk http://www.eecs.umich.edu/Rio

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend