Using GPUs to Enable Highly Reliable Embedded Storage Matthew Curry - - PowerPoint PPT Presentation

using gpus to enable highly reliable embedded storage
SMART_READER_LITE
LIVE PREVIEW

Using GPUs to Enable Highly Reliable Embedded Storage Matthew Curry - - PowerPoint PPT Presentation

Using GPUs to Enable Highly Reliable Embedded Storage Matthew Curry (curryml@cis.uab.edu) Lee Ward (lee@sandia.gov) Anthony Skjellum (tony@cis.uab.edu) Ron Brightwell (rbbrigh@sandia.gov) University of Alabama at Birmingham Computer Science


slide-1
SLIDE 1

Using GPUs to Enable Highly Reliable Embedded Storage

University of Alabama at Birmingham 115A Campbell Hall 1300 University Blvd. Birmingham, AL 35294-1170 Computer Science Research Institute Sandia National Laboratory PO Box 5800 Albuquerque, NM 87123-1319

Matthew Curry (curryml@cis.uab.edu) Lee Ward (lee@sandia.gov) Anthony Skjellum (tony@cis.uab.edu) Ron Brightwell (rbbrigh@sandia.gov) High Performance Embedded Computing (HPEC) Workshop 23-25 September 2008 Approved for public release; distribution is unlimited.

slide-2
SLIDE 2

The Storage Reliability Problem

  • Embedded environments are subject to

harsh conditions where normal failure estimates may not apply

  • Since many embedded systems are

purposed for data collection, data integrity is of high priority

  • Embedded systems often must contain as

little hardware as possible (e.g. space applications)

slide-3
SLIDE 3

Current Methods of Increasing Reliability

  • RAID

– RAID 1: Mirroring (Two-disk configuration) – RAID 5: Single Parity – RAID 6: Dual Parity

  • Nested RAID

– RAID 1+0: Stripe over multiple RAID 1 sets – RAID 5+0: Stripe over multiple RAID 5 sets – RAID 6+0: Stripe over multiple RAID 6 sets

slide-4
SLIDE 4

Current Methods of Increasing Reliability

  • RAID MTTDL (Mean Time to Data Loss)

– RAID 1: MTTF2/2 – RAID 5: MTTF2/(D*(D-1)) – RAID 6: MTTF3/(D*(D-1)*(D-2))

  • Nested RAID MTTDL

– RAID 1+0: MTTDL(RAID1)/N – RAID 5+0: MTTDL(RAID5)/N – RAID 6+0: MTTDL(RAID6)/N

slide-5
SLIDE 5

RAID Reliabliity (1e7 hours MTTF, 24 hours MTTR)

1.00E+00 1.00E+01 1.00E+02 1.00E+03 1.00E+04 1.00E+05 1.00E+06 1.00E+07 1.00E+08 1.00E+09 1.00E+10 1.00E+11 1.00E+12 1.00E+13 1.00E+14 1.00E+15 1.00E+16 1.00E+17 1.00E+18 1.00E+19 4 5 6 8 10 12

Number of Disks MTTDL

RAID N+3 RAID 6+0 RAID 6 RAID 1+0 RAID 5+0 RAID 5 RAID 0

slide-6
SLIDE 6

Why N+3 (Or Higher) Isn’t Done

  • Hardware RAID solutions largely don’t

support it

– Known Exception: RAID-TP from Accusys uses three parity disks

  • Software RAID doesn’t support it

– Reed-Solomon coding is CPU intensive and inefficient with CPU memory organization

slide-7
SLIDE 7

An Overview of Reed-Solomon Coding

  • General method of generating arbitrary

amounts of parity data for n+m systems

  • A vector of n data elements is multiplied

by an n x m dispersal matrix, yielding m parity elements

  • Finite field arithmetic
slide-8
SLIDE 8

Multiplication Example

  • {37} = 32 + 4 + 1 = 100101 = x5 + x2 + x0
  • Use Linear Shift Feedback Register to

multiply an element by {02}

x0 x1 x2 x3 x4 x5 x6 x7

slide-9
SLIDE 9

Multiplication Example

  • Direct arbitrary multiplication requires

distributing so that only addition (XOR) and multiplication by two occur.

– {57} x {37} – {57} x ({02}5 + {02}2 + {02}) – {57} x {02}5 + {57} x {02}2 + {57} x {02}

  • Potentially dozens of elementary
  • perations!
slide-10
SLIDE 10

Optimization: Lookup Tables

  • Similar to the relationship that holds for real

numbers: elog(x)+log(y) = x * y

  • This relationship translates (almost) directly to

finite field arithmetic, with lookup tables for the logarithm and exponentiation operators

  • Unfortunately, parallel table lookup capabilities

aren’t common in commodity processors

– Waiting patiently for SSE5

slide-11
SLIDE 11

NVIDIA GPU Architecture

  • GDDR3 Global Memory
  • 16-30 Multiprocessing Units
  • One shared 8 KB memory region per

multiprocessing unit (16 banks)

  • Eight cores per multiprocessor
slide-12
SLIDE 12

Integrating the GPU

slide-13
SLIDE 13

3+3 Performance

200 400 600 800 1000 1200 3 1 2 2 1 3 3 9 1 2 3 4 8 6 6 8 4 1 2 1 2 1 3 8 1 5 6 1 7 4 1 9 2 2 1 2 2 8 2 4 6 2 6 4 2 8 2 3 3 1 8 3 3 6 3 5 4 3 7 2 3 9 Data Size (KB) Throughput (MB/s) 3+3

slide-14
SLIDE 14

29+3 Performance

1300 1320 1340 1360 1380 1400 1420 1440 1460 1480 1500 58 116 174 232 290 348 Data Size (KB) Throughput (MB/s) 29+3

slide-15
SLIDE 15

Neglecting PCI Traffic: 3+3

500 1000 1500 2000 2500 3 12 21 30 39 12 30 48 66 84 102 120 138 156 174 192 210 228 246 264 282 300 318 336 354 372 390 Data Size (KB) Throughput (MB/s) 3+3 (No PCI Traffic)

slide-16
SLIDE 16

Conclusion

  • GPUs are an inexpensive way to increase

the speed and reliability of software RAID

  • By pipelining requests through the GPU,

N+3 (and greater) are within reach

– Requires minimal hardware investment – Provides greater reliability than available with current hardware solutions – Sustains high throughput compared to modern hard disks