P age 1 Photo of Disk Head, Arm, Disk Device Terminology - - PDF document

p age 1
SMART_READER_LITE
LIVE PREVIEW

P age 1 Photo of Disk Head, Arm, Disk Device Terminology - - PDF document

Motivation: Who Cares About I / O? CS252 Graduate Computer Architecture CPU Perf ormance: 60% per year I / O syst em perf ormance limit ed by mechanical delays (disk I / O) I / O I ntroduction: Storage Devices & RAI D < 10% per


slide-1
SLIDE 1

P age 1

CS252/ Culler Lec 6. 1 2/ 7/ 02

CS252 Graduate Computer Architecture

I / O I ntroduction: Storage Devices & RAI D

Jason Hill

CS252/ Culler Lec 6. 2 2/ 7/ 02

Motivation: Who Cares About I / O?

  • CPU Perf ormance: 60% per year
  • I / O syst em perf ormance limit ed by mechanical

delays (disk I / O)

< 10% per year (I O per sec)

  • Amdahl' s Law: syst em speed- up limited by the

slowest part!

10% I O & 10x CPU => 5x Perf ormance (lose 50%) 10% I O & 100x CPU => 10x Perf ormance (lose 90%)

  • I / O bottleneck:

Diminishing f raction of time in CPU Diminishing value of f aster CPUs

CS252/ Culler Lec 6. 3 2/ 7/ 02

Big Picture: Who cares about CPUs?

  • Why still important to keep CPUs busy vs. I O

devices ("CPU t ime"), as CPUs not cost ly?

– Moore' s Law leads to both large, f ast CPUs but also to very small, cheap CPUs – 2001 Hypothesis: 600 MHz PC is f ast enough f or Of f ice Tools? – PC slowdown since f ast enough unless games, new apps?

  • People care more about about st oring inf ormat ion

and communicat ing inf ormat ion t han calculat ing

– "I nf ormation Technology" vs. "Computer Science" – 1960s and 1980s: Computing Revolution – 1990s and 2000s: I nf ormation Age

  • Next 3 weeks on st orage and communicat ion

CS252/ Culler Lec 6. 4 2/ 7/ 02

I / O Systems

Processor Cache Memory - I/O Bus Main Memory I/O Controller Disk Disk I/O Controller I/O Controller Graphics Network

interrupts interrupts

CS252/ Culler Lec 6. 5 2/ 7/ 02

Storage Technology Drivers

  • Driven by t he prevailing comput ing paradigm

– 1950s: migration f rom batch to on- line processing – 1990s: migration to ubiquitous computing » computers in phones, books, cars, video cameras, … » nationwide f iber optical network with wireless tails

  • Ef f ect s on st orage indust ry:

– Embedded storage » smaller, cheaper, more reliable, lower power – Data utilities » high capacity, hierarchically managed storage

CS252/ Culler Lec 6. 6 2/ 7/ 02

Outline

  • Disk Basics
  • Disk History
  • Disk opt ions in 2000
  • Disk f allacies and perf ormance
  • FLASH
  • Tapes
  • RAI D
slide-2
SLIDE 2

P age 2

CS252/ Culler Lec 6. 7 2/ 7/ 02

Disk Device Terminology

  • Several platters, with inf ormation recorded magnetically on both

surfaces (usually)

  • Actuator moves head (end of arm,1/ surf ace) over track (“seek”

), select surf ace, wait f or sector rotate under head, then read or write

– “Cylinder”: all tracks under heads

  • Bits recorded in tracks, which in turn divided into sectors (e.g.,

512 Bytes)

Platter Outer Track Inner Track Sector Actuator Head Arm

CS252/ Culler Lec 6. 8 2/ 7/ 02

Photo of Disk Head, Arm, Actuator

Actuator Arm Head Platters (12)

{

Spindle

CS252/ Culler Lec 6. 9 2/ 7/ 02

Disk Device Perf ormance

Platter Arm Actuator Head Sector Inner Track Outer Track

  • Disk Lat ency = Seek Time + Rot at ion Time + Transf er

Time + Cont roller Overhead

  • Seek Time? depends no. tracks move arm, seek speed of disk
  • Rotation Time? depends on speed disk rotates, how f ar sector is

f rom head

  • Transf er Time? depends on data rate (bandwidth) of disk (bit

density), size of request

Controller Spindle

CS252/ Culler Lec 6. 10 2/ 7/ 02

Disk Device Perf ormance

  • Average distance sector f rom head?
  • 1/ 2 t ime of a rot at ion

– 10000 Revolutions Per Minute ⇒ 166.67 Rev/ sec – 1 revolution = 1/ 166.67 sec ⇒ 6.00 milliseconds – 1/ 2 rotation (revolution) ⇒ 3.00 m s

  • Average no. t racks move arm?

– Sum all possible seek distances f rom all possible tracks / # possible » Assumes average seek distance is random – Disk industry standard benchmark

CS252/ Culler Lec 6. 11 2/ 7/ 02

Data Rate: I nner vs. Outer Tracks

  • To keep t hings simple, orginally kept same number of

sect ors per t rack

– Since outer track longer, lower bits per inch

  • Compet it ion ⇒ decided to keep BPI the same f or all

tracks (“const ant bit densit y”)

⇒ More capacity per disk ⇒ More of sectors per track towards edge ⇒ Since disk spins at constant speed,

  • uter tracks have f aster data rate
  • Bandwidt h out er t rack 1. 7X inner t rack!

– I nner track highest density, outer track lowest, so not really const ant – 2. 1X length of track outer / inner, 1. 7X bits outer / inner

CS252/ Culler Lec 6. 12 2/ 7/ 02

Devices: Magnetic Disks

Sector Track Cylinder Head Platter

  • Purpose:

– Long- term, nonvolatile storage – Large, inexpensive, slow level in the storage hierarchy

  • Characteristics:

– Seek Time (~8 ms avg) » positional latency » rotational latency

  • Transf er rate

– 10- 40 MByte/ sec – Blocks

  • Capacit y

– Gigabytes – Quadruples every 2 years (aerodynamics)

7200 RPM = 120 RPS => 8 ms per rev ave rot. latency = 4 ms 128 sectors per track => 0.25 ms per sector 1 KB per sector => 16 MB / s

Response time = Queue + Controller + Seek + Rot + Xfer Service time

slide-3
SLIDE 3

P age 3

CS252/ Culler Lec 6. 13 2/ 7/ 02

Disk Perf ormance Model / Trends

  • Capacit y

+ 100%/ year (2X / 1.0 yrs)

  • Transf er rate (BW)

+ 40%/ year (2X / 2.0 yrs)

  • Rotation + Seek time

– 8%/ year (1/ 2 in 10 yrs)

  • MB/ $

> 100%/ year (2X / 1.0 yrs) Fewer chips + areal density

CS252/ Culler Lec 6. 14 2/ 7/ 02

State of the Art: Barracuda 180

– 181. 6 GB, 3. 5 inch disk – 12 plat t ers, 24 surf aces – 24, 247 cylinders – 7, 200 RPM; (4. 2 ms avg. lat ency) – 7. 4/ 8. 2 ms avg. seek (r/ w) – 64 t o 35 MB/ s (int ernal) – 0. 1 ms controller time – 10. 3 wat t s (idle)

source: www.seagate.com

Latency = Queuing Time + Controller time + Seek Time + Rotation Time + Size / Bandwidth per access per byte{

+

Sector Track Cylinder Head Platter Arm Track Buffer

CS252/ Culler Lec 6. 15 2/ 7/ 02

Disk Perf ormance Example (will f ix later)

  • Calculat e t ime t o read 64 KB (128 sect ors) f or

Barracuda 180 X using advert ised perf ormance; sector is on outer track Disk lat ency = average seek t ime + average rot at ional delay + t ransf er t ime + cont roller

  • verhead

= 7. 4 ms + 0. 5 * 1/ (7200 RPM) + 64 KB / (65 MB/ s) + 0. 1 ms = 7. 4 ms + 0. 5 / (7200 RPM/ (60000ms/ M)) + 64 KB / (65 KB/ ms) + 0. 1 ms = 7. 4 + 4. 2 + 1. 0 + 0. 1 ms = 12. 7 ms

CS252/ Culler Lec 6. 16 2/ 7/ 02

CS 252 Administrivia

CS252/ Culler Lec 6. 17 2/ 7/ 02

Areal Density

  • Bits recorded along a track

– Metric is Bits Per I nch (BPI )

  • Number of t racks per surf ace

– Metric is Tracks Per I nch (TPI )

  • Disk Designs Brag about bit densit y per unit area

– Metric is Bit s Per Square I nch – Called Areal Density – Areal Density = BPI x TPI

CS252/ Culler Lec 6. 18 2/ 7/ 02

Areal Density

Year Areal Density 1973 1.7 1979 7.7 1989 63 1997 3090 2000 17100

1 10 100 1000 10000 100000 1970 1980 1990 2000 Year Areal Density

– Areal Density = BPI x TPI – Change slope 30%/ yr to 60%/ yr about 1991

slide-4
SLIDE 4

P age 4

CS252/ Culler Lec 6. 19 2/ 7/ 02

MBits per square inch: DRAM as % of Disk over time

0% 10% 20% 30% 40% 50% 1974 1980 1986 1992 1998 2000

source: New York Times, 2/23/98, page C3, “Makers of disk drives crowd even mroe data into even smaller spaces” 470 v. 3000 Mb/si 9 v. 22 Mb/si 0.2 v. 1.7 Mb/si

CS252/ Culler Lec 6. 20 2/ 7/ 02

Historical Perspective

  • 1956 I BM Ramac — early 1970s Winchester

– Developed f or mainf rame computers, proprietary interf aces – Steady shrink in f orm f actor: 27 in. to 14 in

  • Form f act or and capacit y drives market , more t han

perf ormance

  • 1970s: Mainf rames ⇒ 14 inch diamet er disks
  • 1980s: Minicomput ers, Servers ⇒ 8”, 5 1/ 4” diameter
  • PCs, workst at ions Lat e 1980s/ Early 1990s:

– Mass market disk drives become a reality » industry standards: SCSI , I PI , I DE – Pizzabox P Cs ⇒ 3. 5 inch diameter disks – Laptops, notebooks ⇒ 2. 5 inch disks – Palmtops didn’t use disks, so 1. 8 inch diameter disks didn’t make it

  • 2000s:

– 1 inch f or cameras, cell phones?

CS252/ Culler Lec 6. 21 2/ 7/ 02

Disk History

Data density Mbit/sq. in. Capacity of Unit Shown Megabytes

1973:

  • 1. 7 Mbit/sq. in

140 MBytes 1979:

  • 7. 7 Mbit/sq. in

2,300 MBytes

source: New York Times, 2/23/98, page C3, “Makers of disk drives crowd even more data into even smaller spaces”

CS252/ Culler Lec 6. 22 2/ 7/ 02

Disk History

1989: 63 Mbit/sq. in 60,000 MBytes 1997: 1450 Mbit/sq. in 2300 MBytes

source: New York Times, 2/23/98, page C3, “Makers of disk drives crowd even mroe data into even smaller spaces”

1997: 3090 Mbit/sq. in 8100 MBytes

CS252/ Culler Lec 6. 23 2/ 7/ 02

1 inch disk drive!

  • 2000 I BM MicroDrive:

– 1.7” x 1.4” x 0.2” – 1 GB, 3600 RPM, 5 MB/ s, 15 ms seek – Digital camera, PalmPC?

  • 2006 MicroDrive?
  • 9 GB, 50 MB/ s!

– Assuming it f inds a niche in a successf ul product – Assuming past trends continue

CS252/ Culler Lec 6. 24 2/ 7/ 02

Disk Characteristics in 2000

Seagate Cheetah ST173404LC Ultra160 SCSI IBM Travelstar 32GH DJSA - 232 ATA-4 IBM 1GB Microdrive DSCM-11000

Disk diameter (inches)

3.5 2.5 1.0

Formatted data capacity (GB)

73.4 32.0 1.0

Cylinders

14,100 21,664 7,167

Disks

12 4 1

Recording Surfaces (Heads)

24 8 2

Bytes per sector

512 to 4096 512 512

Avg Sectors per track (512 byte)

~ 424 ~ 360 ~ 140

  • Max. areal

density(Gbit/sq.in.)

6.0 14.0 15.2

$ 447 $ 435 $ 828

slide-5
SLIDE 5

P age 5

CS252/ Culler Lec 6. 25 2/ 7/ 02

Disk Characteristics in 2000

Seagate Cheetah ST173404LC Ultra160 SCSI IBM Travelstar 32GH DJSA - 232 ATA-4 IBM 1GB Microdrive DSCM-11000

Rotation speed (RPM)

10033 5411 3600

  • Avg. seek ms

(read/write)

5.6/6.2 12.0 12.0

Minimum seek ms (read/write)

0.6/0.9 2.5 1.0

  • Max. seek ms

14.0/15.0 23.0 19.0

Data transfer rate MB/second

27 to 40 11 to 21 2.6 to 4.2

Link speed to buffer MB/s

160 67 13

Power idle/operating Watts

16.4 / 23.5 2.0 / 2.6 0.5 / 0.8

CS252/ Culler Lec 6. 26 2/ 7/ 02

Disk Characteristics in 2000

Seagate Cheetah ST173404LC Ultra160 SCSI IBM Travelstar 32GH DJSA - 232 ATA-4 IBM 1GB Microdrive DSCM-11000

Buffer size in MB

4.0 2.0 0.125

Size: height x width x depth inches

1.6 x 4.0 x 5.8 0.5 x 2.7 x 3.9 0.2 x 1.4 x 1.7

Weight pounds

2.00 0.34 0.035

Rated MTTF in powered-on hours

1,200,000 (300,000?) (20K/5 yr life?)

% of POH per month

100% 45% 20%

% of POH seeking, reading, writing

90% 20% 20%

CS252/ Culler Lec 6. 27 2/ 7/ 02

Disk Characteristics in 2000

Seagate Cheetah ST173404LC Ultra160 SCSI IBM Travelstar 32GH DJSA - 232 ATA-4 IBM 1GB Microdrive DSCM-11000

Load/Unload cycles (disk powered on/off)

250 per year 300,000 300,000

Nonrecoverable read errors per bits read

<1 per 10

15

< 1 per 10

13

< 1 per 10

13

Seek errors

<1 per 10

7

not available not available

Shock tolerance: Operating, Not

  • perating

10 G, 175 G 150 G, 700 G 175 G, 1500 G

Vibration tolerance: Operating, Not

  • perating (sine

swept, 0 to peak)

5-400 Hz @ 0.5G, 22-400 Hz @ 2.0G 5-500 Hz @ 1.0G, 2.5-500 Hz @ 5.0G 5-500 Hz @ 1G, 10- 500 Hz @ 5G

CS252/ Culler Lec 6. 28 2/ 7/ 02

Fallacy: Use Data Sheet “Average Seek” Time

  • Manuf act urers needed st andard f or f air comparison

(“benchmark”)

– Calculate all seeks f rom all tracks, divide by number of seeks => “average”

  • Real average would be based on how dat a laid out on

disk, where seek in real applications, then measure perf ormance

– Usually, tend to seek to tracks nearby, not to random track

  • Rule of Thumb: observed average seek t ime is

t ypically about 1/ 4 t o 1/ 3 of quot ed seek t ime (i. e. , 3X- 4X f ast er)

– Barracuda 180 X avg. seek: 7.4 ms ⇒ 2.5 m s

CS252/ Culler Lec 6. 29 2/ 7/ 02

Fallacy: Use Data Sheet Transf er Rate

  • Manuf acturers quote the speed of f the data rate of f

the surf ace of the disk

  • Sect ors cont ain an error det ect ion and correct ion

f ield (can be 20% of sect or size) plus sect or number as well as dat a

  • There are gaps between sectors on track
  • Rule of Thumb: disks deliver about 3/ 4 of internal

media rat e (1. 3X slower) f or dat a

  • For example, Barracuda 180X quot es

64 t o 35 MB/ sec int ernal media rat e ⇒ 47 to 26 MB/ sec external data rate (74%)

CS252/ Culler Lec 6. 30 2/ 7/ 02

Disk Perf ormance Example

  • Calculat e t ime t o read 64 KB f or UltraStar 72

again, t his t ime using 1/ 3 quot ed seek t ime, 3/ 4 of internal outer track bandwidth; (12. 7 ms bef ore) Disk lat ency = average seek t ime + average rot at ional delay + t ransf er t ime + cont roller

  • verhead

= (0. 33 * 7. 4 ms) + 0. 5 * 1/ (7200 RPM) + 64 KB / (0. 75 * 65 MB/ s) + 0. 1 ms = 2. 5 ms + 0. 5 / (7200 RPM/ (60000ms/ M)) + 64 KB / (47 KB/ ms) + 0. 1 ms = 2. 5 + 4. 2 + 1. 4 + 0. 1 ms = 8. 2 ms (64% of 12. 7)

slide-6
SLIDE 6

P age 6

CS252/ Culler Lec 6. 31 2/ 7/ 02

Future Disk Size and Perf ormance

  • Cont inued advance in capacit y (60%/ yr) and

bandwidt h (40%/ yr)

  • Slow improvement in seek, rot at ion (8%/ yr)
  • Time t o read whole disk

Year Sequentially Randomly (1 sect or/ seek) 1990 4 minut es 6 hours 2000 12 minut es 1 week(!)

  • 3. 5” f orm f act or make sense in 5 yrs?

– What is capacity, bandwidth, seek time, RPM? – Assume today 80 GB, 30 MB/ sec, 6 ms, 10000 RPM

CS252/ Culler Lec 6. 32 2/ 7/ 02

What about FLASH

  • Compact Flash Cards

– I nt el St rat a Flash » 16 Mb in 1 square cm. (. 6 mm thick) – 100, 000 writ e/ erase cycles. – St andby current = 100uA, writ e = 45mA – Compact Flash 256MB~=$ 120 512MB~=$ 542 – Transf er @ 3. 5MB/ s

  • I BM Microdrive 1G~370

– St andby current = 20mA, writ e = 250mA – Ef f iciency advertised in wats/ MB

  • VS. Disks

– Nearly inst ant st andby wake- up t ime – Random access to data stored – Tolerant to shock and vibration (1000G of operating shock)

CS252/ Culler Lec 6. 33 2/ 7/ 02

Tape vs. Disk

  • Longitudinal tape uses same technology as

hard disk; tracks its density improvements

  • Disk head flies above surface, tape head lies on surface
  • Disk fixed, tape removable
  • Inherent cost-performance based on geometries:

fixed rotating platters with gaps

(random access, limited area, 1 media / reader)

vs. removable long strips wound on spool

(sequential access, "unlimited" length, multiple / reader)

  • Helical Scan (VCR, Camcoder, DAT)

Spins head at angle to tape to improve density

CS252/ Culler Lec 6. 34 2/ 7/ 02

Current Drawbacks to Tape

  • Tape wear out:

– Helical 100s of passes to 1000s f or longitudinal

  • Head wear out :

– 2000 hours f or helical

  • Bot h must be account ed f or in economic /

reliability model

  • Bit s st ret ch
  • Readers must be compat ible wit h mult iple

generat ions of media

  • Long rewind, eject , load, spin- up times;

not inherent , just no need in market place

  • Designed f or archival

CS252/ Culler Lec 6. 35 2/ 7/ 02

Automated Cartridge System: StorageTek Powderhorn 9310

  • 6000 x 50 GB 9830 t apes = 300 TBytes in

2000 (uncompressed)

– Library of Congress: all inf ormation in the world; in 1992, ASCI I of all books = 30 TB – Exchange up to 450 tapes per hour (8 secs / t ape)

  • 1. 7 t o 7. 7 Mbyte/ sec per reader, up t o 10

readers

7.7 feet 10.7 feet 8200 pounds, 1.1 kilowatts

CS252/ Culler Lec 6. 36 2/ 7/ 02

Library vs. Storage

  • Getting books today as quaint as the way I

learned t o program

– punch cards, batch processing – wander thru shelves, anticipatory purchasing

  • Cost $ 1 per book t o check out
  • $ 30 f or a cat alogue ent ry
  • 30% of all books never checked out
  • Writ e only journals?
  • Digit al library can t ransf orm campuses
slide-7
SLIDE 7

P age 7

CS252/ Culler Lec 6. 37 2/ 7/ 02

Whither tape?

  • I nvest ment in research:

– 90% of disks shipped in PCs; 100% of PCs have disks – ~0% of tape readers shipped in PCs; ~0% of PCs have disks

  • Bef ore, N disks / tape; today, N tapes / disk

– 40 GB/ DLT tape (uncompressed) – 80 to 192 GB/ 3. 5" disk (uncompressed)

  • Cost per GB:

– I n past, 10X to 100X tape cartridge vs. disk – Jan 2001: 40 GB f or $53 (DLT cartridge), $2800 f or reader – $1.33/ GB cartridge, $2.03/ GB 100 cartridges + 1 reader – ($10995 f or 1 reader + 15 tape autoloader, $10.50/ GB) – Jan 2001: 80 GB f or $244 (I DE, 5400 RPM), $3. 05/ GB – Will $/ GB tape v. disk cross in 2001? 2002? 2003?

  • Storage f ield is based on tape backup; what should

we do? Discussion if t ime permit s?

CS252/ Culler Lec 6. 38 2/ 7/ 02

Use Arrays of Small Disks?

14” 10” 5.25” 3.5” 3.5”

Disk Array: 1 disk design Conventional: 4 disk designs Low End High End

  • Katz and Patterson asked in 1987:
  • Can smaller disks be used to close gap in

performance between disks and CPUs?

CS252/ Culler Lec 6. 39 2/ 7/ 02

Advantages of Small Formf actor Disk Drives

Low cost/MB High MB/volume High MB/watt Low cost/Actuator Cost and Environmental Efficiencies

CS252/ Culler Lec 6. 40 2/ 7/ 02

Replace Small Number of Large Disks wit h Large Number of Small Disks! (1988 Disks)

Capacity Volume Power Data Rate I/O Rate MTTF Cost IBM 3390K 20 GBytes 97 cu. ft. 3 KW 15 MB/s 600 I/Os/s 250 KHrs $250K IBM 3.5" 0061 320 MBytes 0.1 cu. ft. 11 W 1.5 MB/s 55 I/Os/s 50 KHrs $2K x70 23 GBytes 11 cu. ft. 1 KW 120 MB/s 3900 IOs/s ??? Hrs $150K Disk Arrays have potential for large data and I/O rates, high MB per cu. ft., high MB per KW, but what about reliability?

9X 3X 8X 6X

CS252/ Culler Lec 6. 41 2/ 7/ 02

Array Reliability

  • Reliability of N disks = Reliability of 1 Disk ÷ N

50,000 Hours ÷ 70 disks = 700 hours Disk system MTTF: Drops from 6 years to 1 month!

  • Arrays (without redundancy) too unreliable to be useful!

Hot spares support reconstruction in parallel with access: very high media availability can be achieved Hot spares support reconstruction in parallel with access: very high media availability can be achieved

CS252/ Culler Lec 6. 42 2/ 7/ 02

Redundant Arrays of (I nexpensive) Disks

  • Files are "st riped" across mult iple disks
  • Redundancy yields high dat a availabilit y

– Availability: service still provided to user, even if some components f ailed

  • Disks will st ill f ail
  • Cont ent s reconst ruct ed f rom dat a redundant ly

st ored in t he array

⇒ Capacity penalty to store redundant inf o ⇒ Bandwidth penalty to update redundant inf o

slide-8
SLIDE 8

P age 8

CS252/ Culler Lec 6. 43 2/ 7/ 02

Redundant Arrays of I nexpensive Disks RAI D 1: Disk Mirroring/ Shadowing

  • Each disk is fully duplicated onto its “mirror”

Very high availability can be achieved

  • Bandwidth sacrifice on write:

Logical write = two physical writes

  • Reads may be optimized
  • Most expensive solution: 100% capacity overhead
  • (RAID 2 not interesting, so skip)

recovery group

CS252/ Culler Lec 6. 44 2/ 7/ 02

Redundant Array of I nexpensive Disks RAI D 3: Parity Disk

P 10010011 11001101 10010011 . . . logical record 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 P contains sum of

  • ther disks per stripe

mod 2 (“parity”) If disk fails, subtract P from sum of other disks to find missing information Striped physical records

CS252/ Culler Lec 6. 45 2/ 7/ 02

RAI D 3

  • Sum computed across recovery group to protect against hard disk

f ailures, stored in P disk

  • Logically, a single high capacity, high transf er rate disk: good

f or large transf ers

  • Wider arrays reduce capacity costs, but decreases availability
  • 33% capacity cost f or parity in this conf iguration

CS252/ Culler Lec 6. 46 2/ 7/ 02

I nspiration f or RAI D 4

  • RAI D 3 relies on parity disk to discover errors
  • n Read
  • But every sector has an error detection f ield
  • Rely on error detection f ield to catch errors on read, not on the

parity disk

  • Allows independent reads to dif f erent disks simultaneously

CS252/ Culler Lec 6. 47 2/ 7/ 02

Redundant Arrays of I nexpensive Disks RAI D 4: High I / O Rate Parity

D0 D1 D2 D3 P D4 D5 D6 P D7 D8 D9 P D10 D11 D12 P D13 D14 D15 P D16 D17 D18 D19 D20 D21 D22 D23 P . . . . . . . . . . . . . . .

Disk Columns Increasing Logical Disk Address Stripe Insides of 5 disks Insides of 5 disks Example: small read D0 & D5, large write D12-D15 Example: small read D0 & D5, large write D12-D15

CS252/ Culler Lec 6. 48 2/ 7/ 02

I nspiration f or RAI D 5

  • RAI D 4 works well f or small reads
  • Small writ es (writ e t o one disk):

– Option 1: read other data disks, create new sum and write to Parity Disk – Option 2: since P has old sum, compare old data to new data, add t he dif f erence t o P

  • Small writ es are limit ed by Parit y Disk: Writ e t o D0,

D5 both also write to P disk

D0 D1 D2 D3 P D4 D5 D6 P D7

slide-9
SLIDE 9

P age 9

CS252/ Culler Lec 6. 49 2/ 7/ 02

Redundant Arrays of I nexpensive Disks RAI D 5: High I / O Rate I nterleaved P arity

Independent writes possible because of interleaved parity Independent writes possible because of interleaved parity

D0 D1 D2 D3 P D4 D5 D6 P D7 D8 D9 P D10 D11 D12 P D13 D14 D15 P D16 D17 D18 D19 D20 D21 D22 D23 P . . . . . . . . . . . . . . . Disk Columns Increasing Logical Disk Addresses

Example: write to D0, D5 uses disks 0, 1, 3, 4

CS252/ Culler Lec 6. 50 2/ 7/ 02

Problems of Disk Arrays: Small Writes

D0 D1 D2 D3 P D0' + + D0' D1 D2 D3 P' new data

  • ld

data

  • ld

parity XOR XOR (1. Read) (2. Read) (3. Write) (4. Write) RAID-5: Small Write Algorithm 1 Logical Write = 2 Physical Reads + 2 Physical Writes

CS252/ Culler Lec 6. 51 2/ 7/ 02

System Availability: Orthogonal RAI Ds

Array Controller String Controller String Controller String Controller String Controller String Controller String Controller . . . . . . . . . . . . . . . . . . Data Recovery Group: unit of data redundancy Redundant Support Components: fans, power supplies, controller, cables End to End Data Integrity: internal parity protected data paths

CS252/ Culler Lec 6. 52 2/ 7/ 02

System

  • Level Availability

Fully dual redundant I/O Controller I/O Controller Array Controller Array Controller . . . . . . . . . . . . . . . . . . Recovery Group Goal: No Single Points of Failure Goal: No Single Points of Failure host host with duplicated paths, higher performance can be

  • btained when there are no failures

CS252/ Culler Lec 6. 53 2/ 7/ 02

Berkeley History: RAI D- I

  • RAI D- I (1989)

– Consisted of a Sun 4/ 280 workstation with 128 MB of DRAM, f our dual- string SCSI controllers, 28 5.25 - inch SCSI disks and specialized disk striping sof tware

  • Today RAI D is $ 19 billion

dollar indust ry, 80% nonPC disks sold in RAI Ds

CS252/ Culler Lec 6. 54 2/ 7/ 02

Summary: RAI D Techniques: Goal was perf ormance, popularity due to reliability of storage

  • Disk Mirroring, Shadowing (RAID 1)

Each disk is fully duplicated onto its "shadow" Logical write = two physical writes 100% capacity overhead

  • Parity Data Bandwidth Array (RAID 3)

Parity computed horizontally Logically a single high data bw disk

  • High I/O Rate Parity Array (RAID 5)

Interleaved parity blocks Independent reads and writes Logical write = 2 reads + 2 writes

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

slide-10
SLIDE 10

P age 10

CS252/ Culler Lec 6. 55 2/ 7/ 02

Summary Storage

  • Disks:

– Extraodinary advance in capacity/ drive, $/ GB – Currently 17 Gbit/ sq. in. ; can continue past 100 Gbit/ sq. in. ? – Bandwidth, seek time not keeping up: 3. 5 inch f orm f actor makes sense? 2. 5 inch f orm f actor in near f uture? 1. 0 inch f orm f actor in long term?

  • Tapes

– No investment, must be backwards compatible – Are they already dead? – What is a tapeless backup system?