Reliability Hierarchies Peter M. Chen David E. Lowell Computer - - PowerPoint PPT Presentation

reliability hierarchies
SMART_READER_LITE
LIVE PREVIEW

Reliability Hierarchies Peter M. Chen David E. Lowell Computer - - PowerPoint PPT Presentation

Reliability Hierarchies Peter M. Chen David E. Lowell Computer Science and Engineering Division Electrical Engineering and Computer Science University of Michigan Performance Hierarchies better L1 cache performance L2 cache main memory


slide-1
SLIDE 1

Reliability Hierarchies

Peter M. Chen David E. Lowell Computer Science and Engineering Division Electrical Engineering and Computer Science University of Michigan

slide-2
SLIDE 2

Performance Hierarchies

main memory L2 cache L1 cache swap disk better better cost performance

slide-3
SLIDE 3

Reliability Hierarchies

  • n-site tape backup

disk memory remote backup better overhead better reliability performance cost power

slide-4
SLIDE 4

Write-Back Policy

When to transfer data to lower, more reliable level? Write-through

  • most reliable
  • effectively eliminates upper level from reliability

hierarchy Delayed-write

  • e.g. write new data to memory, then transfer to

disk after 15 seconds

  • trade-off between reliability and overhead (e.g.

performance)

slide-5
SLIDE 5

Metrics for a Reliability Hierarchy

Mean time to data loss (MTTDL)

  • limited by reliability of highest (least reliable)

level

  • doesn’t distinguish between degrees of data

loss Data loss rate

  • fraction of new data lost over time

data lossL MTTFL

  • all levels

slide-6
SLIDE 6

Example Faults and Storage Levels

Fault Category Example MTTF Storage Levels Affected by Fault CPU/ memory disk RAID

  • n-site

backup remote backup

  • perating system

2 months

file system 5 years

✔ ✔ ✔

power 10 years (UPS)

motherboard 5 years

media 5 years

catastrophe 50 years

✔ ✔ ✔ ✔

slide-7
SLIDE 7

Analysis of Michigan Server

  • n-site tape backup

disk memory 15 1 day seconds MTTF = 50 years MTTF = 2.4 years MTTF = 0.15 years

  • verall MTTDL = 0.15 years

data loss rate = 10 hours/year

slide-8
SLIDE 8

Rio on PCs

New level in the storage hierarchy: reliable main memory Enable memory to survive operating system crashes crash starts protect memory safe sync crash finishes

slide-9
SLIDE 9

Example Faults and Storage Levels

Fault Category Example MTTF Storage Levels Affected by Fault CPU/memory CPU/memory with Rio disk

  • perating system

2 months

file system 5 years

✔ ✔ ✔

power 10 years (UPS)

✔ ✔

motherboard 5 years

✔ ✔

media 5 years

catastrophe 50 years

✔ ✔ ✔

slide-10
SLIDE 10

Rio’s Effect on Reliability

  • n-site tape backup

disk memory

MTTF = 50 years MTTF = 2.4 years MTTF = 0.15 years ➜ 1.9 years

  • verall MTTDL = 0.15 years ➜ 1.4 years

data loss rate = 10 hours/year

slide-11
SLIDE 11

Conclusions

Two views of hierarchies

  • trade-off between cost and performance
  • trade-off between reliability and performance/

cost/power/etc. Rio fills in the “reliability gap” between memory and disk

  • hypothesis: can use Rio to store new types of

data that would like higher reliability than memory but can’t afford overhead of disk http://www.eecs.umich.edu/Rio