Using GPUs to Enable Highly Reliable Embedded Storage Matthew Curry - PowerPoint PPT Presentation

Using GPUs to Enable Highly Reliable Embedded Storage Matthew Curry (curryml@cis.uab.edu) Lee Ward (lee@sandia.gov) Anthony Skjellum (tony@cis.uab.edu) Ron Brightwell (rbbrigh@sandia.gov) University of Alabama at Birmingham Computer Science Research Institute 115A Campbell Hall Sandia National Laboratory 1300 University Blvd. PO Box 5800 Birmingham, AL 35294-1170 Albuquerque, NM 87123-1319 High Performance Embedded Computing (HPEC) Workshop 23-25 September 2008 Approved for public release; distribution is unlimited.

The Storage Reliability Problem • Embedded environments are subject to harsh conditions where normal failure estimates may not apply • Since many embedded systems are purposed for data collection, data integrity is of high priority • Embedded systems often must contain as little hardware as possible (e.g. space applications)

Current Methods of Increasing Reliability • RAID – RAID 1: Mirroring (Two-disk configuration) – RAID 5: Single Parity – RAID 6: Dual Parity • Nested RAID – RAID 1+0: Stripe over multiple RAID 1 sets – RAID 5+0: Stripe over multiple RAID 5 sets – RAID 6+0: Stripe over multiple RAID 6 sets

Current Methods of Increasing Reliability • RAID MTTDL (Mean Time to Data Loss) – RAID 1: MTTF 2 /2 – RAID 5: MTTF 2 /(D*(D-1)) – RAID 6: MTTF 3 /(D*(D-1)*(D-2)) • Nested RAID MTTDL – RAID 1+0: MTTDL(RAID1)/N – RAID 5+0: MTTDL(RAID5)/N – RAID 6+0: MTTDL(RAID6)/N

RAID Reliabliity (1e7 hours MTTF, 24 hours MTTR) 1.00E+19 1.00E+18 1.00E+17 1.00E+16 1.00E+15 1.00E+14 1.00E+13 1.00E+12 RAID N+3 1.00E+11 RAID 6+0 RAID 6 MTTDL 1.00E+10 RAID 1+0 1.00E+09 RAID 5+0 RAID 5 1.00E+08 RAID 0 1.00E+07 1.00E+06 1.00E+05 1.00E+04 1.00E+03 1.00E+02 1.00E+01 1.00E+00 4 5 6 8 10 12 Number of Disks

Why N+3 (Or Higher) Isn’t Done • Hardware RAID solutions largely don’t support it – Known Exception: RAID-TP from Accusys uses three parity disks • Software RAID doesn’t support it – Reed-Solomon coding is CPU intensive and inefficient with CPU memory organization

An Overview of Reed-Solomon Coding • General method of generating arbitrary amounts of parity data for n+m systems • A vector of n data elements is multiplied by an n x m dispersal matrix, yielding m parity elements • Finite field arithmetic

Multiplication Example • {37} = 32 + 4 + 1 = 100101 = x 5 + x 2 + x 0 • Use Linear Shift Feedback Register to multiply an element by {02} x 0 x 1 x 2 x 3 x 4 x 5 x 6 x 7

Multiplication Example • Direct arbitrary multiplication requires distributing so that only addition (XOR) and multiplication by two occur. – {57} x {37} – {57} x ({02} 5 + {02} 2 + {02}) – {57} x {02} 5 + {57} x {02} 2 + {57} x {02} • Potentially dozens of elementary operations!

Optimization: Lookup Tables • Similar to the relationship that holds for real numbers: e log(x)+log(y) = x * y • This relationship translates (almost) directly to finite field arithmetic, with lookup tables for the logarithm and exponentiation operators • Unfortunately, parallel table lookup capabilities aren’t common in commodity processors – Waiting patiently for SSE5

NVIDIA GPU Architecture • GDDR3 Global Memory • 16-30 Multiprocessing Units • One shared 8 KB memory region per multiprocessing unit (16 banks) • Eight cores per multiprocessor

Integrating the GPU

3+3 Performance 1200 1000 800 Throughput (MB/s) 600 3+3 400 200 0 3 2 1 0 9 2 0 8 6 4 2 0 8 6 4 2 0 8 6 4 2 0 2 0 8 6 4 1 2 3 3 1 3 4 6 8 0 2 3 5 7 9 1 2 4 6 8 0 1 3 5 7 9 1 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 3 Data Size (KB)

29+3 Performance 1500 1480 1460 1440 1420 Throughput (MB/s) 1400 29+3 1380 1360 1340 1320 1300 58 116 174 232 290 348 Data Size (KB)

Neglecting PCI Traffic: 3+3 Throughput (MB/s) 1000 1500 2000 2500 500 0 3 12 21 30 39 12 30 48 66 84 102 120 Data Size (KB) 138 156 174 192 210 228 246 264 282 300 318 336 354 372 390 3+3 (No PCI Traffic)

Conclusion • GPUs are an inexpensive way to increase the speed and reliability of software RAID • By pipelining requests through the GPU, N+3 (and greater) are within reach – Requires minimal hardware investment – Provides greater reliability than available with current hardware solutions – Sustains high throughput compared to modern hard disks

Using GPUs to Enable Highly Reliable Embedded Storage Matthew Curry - PowerPoint PPT Presentation

Using GPUs to Enable Highly Reliable Embedded Storage Matthew Curry (curryml@cis.uab.edu) Lee Ward (lee@sandia.gov) Anthony Skjellum (tony@cis.uab.edu) Ron Brightwell (rbbrigh@sandia.gov) University of Alabama at Birmingham Computer Science

Why use GPUs for graph processing? FOSDEM 2020 2 GPUs and Graphs Graphs GPUs Found

Embedded PC The modular Industrial PC for mid-range control Embedded PC 1 Embedded OS

EMBEDDED EMBEDDED REAL TIME SYSTEMS REAL TIME SYSTEMS EMBEDDED EMBEDDED REAL TIME SYSTEMS

Platform Convergence Journey Windows Embedded Standard 7 Windows Embedded Standard 8 Converged

Prior Work Consensus Consensus Reliable BGP Consensus Reliable BGP Consensus Routing

Prior Work Consensus Consensus Reliable BGP Consensus Reliable BGP Consensus Routing

Embedded PC The modular Industrial PC for mid-range control Stefan Hoppe 14.09.2007 1 Embedded

> SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE

Building highly reliable data pipelines @ Datadog Quentin FRANCOIS Team Lead, Data Engineering

NL(C)V Series SMD Inductor for Power/Signal Line FEATURES Highly reliable and adaptable to

SUSE Enterprise Storage 142 142 SUSE Enterprise Storage An intelligent software-defined storage

gholzmann@acm.org ISO 26262: highly recommended EN 50128: highly recommended IEC 61508: highly

SUSE Enterprise Storage 6 Darren Soothill EMEA Storage Technical Strategist Agenda

Solar Plus Storage Solar Plus Storage Focus on Storage Benefits Focus on Storage Benefits by

Hybrid SAN & Cluster Enterprise Network Storage Hikvision Enterprise Network Storage

INF5470 Fall 2012 Lecture 10: Analog Storage Content Overview Volatile Short Term Storage

Automotive Regulations & Certification Processes A global manufacturers perspective Dennis

Virginia Department of Education, University of Virginias Curry School of Education 1 3

Securing Our Future Committee Meeting 3: Wednesday, June 27 Joan Y. Ervin Elementary School

TKN Q1/19 OPPORTUNITY DAY Quarter 4 /18 Taokaenoi Food & Marketing Public Company Limited

KLOUD KITCHEN P R I VAT E & C O N F I D E N T I A L INSPIRATION Looking at the competition

SEA and MM Diner Trends 13 14 14 For Myanmar: Rise of sit-in Bakeries in MM, which provide

Atlanta Memorial Park Conservancy (formally The Bobby Jones Golf Course and Park Conservancy) 1

Meeting #9 July 11, 2018 LYNX Central Station, 2 nd Floor Open Area 1 Schedule What are the

Using GPUs to Enable Highly Reliable Embedded Storage Matthew Curry - PowerPoint PPT Presentation

Using GPUs to Enable Highly Reliable Embedded Storage Matthew Curry (curryml@cis.uab.edu) Lee Ward (lee@sandia.gov) Anthony Skjellum (tony@cis.uab.edu) Ron Brightwell (rbbrigh@sandia.gov) University of Alabama at Birmingham Computer Science

Why use GPUs for graph processing? FOSDEM 2020 2 GPUs and Graphs Graphs GPUs Found

Embedded PC The modular Industrial PC for mid-range control Embedded PC 1 Embedded OS

EMBEDDED EMBEDDED REAL TIME SYSTEMS REAL TIME SYSTEMS EMBEDDED EMBEDDED REAL TIME SYSTEMS

Platform Convergence Journey Windows Embedded Standard 7 Windows Embedded Standard 8 Converged

Prior Work Consensus Consensus Reliable BGP Consensus Reliable BGP Consensus Routing

Prior Work Consensus Consensus Reliable BGP Consensus Reliable BGP Consensus Routing

Embedded PC The modular Industrial PC for mid-range control Stefan Hoppe 14.09.2007 1 Embedded

&gt; SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE

Building highly reliable data pipelines @ Datadog Quentin FRANCOIS Team Lead, Data Engineering

NL(C)V Series SMD Inductor for Power/Signal Line FEATURES Highly reliable and adaptable to

SUSE Enterprise Storage 142 142 SUSE Enterprise Storage An intelligent software-defined storage

gholzmann@acm.org ISO 26262: highly recommended EN 50128: highly recommended IEC 61508: highly

SUSE Enterprise Storage 6 Darren Soothill EMEA Storage Technical Strategist Agenda

Solar Plus Storage Solar Plus Storage Focus on Storage Benefits Focus on Storage Benefits by

Hybrid SAN &amp; Cluster Enterprise Network Storage Hikvision Enterprise Network Storage

INF5470 Fall 2012 Lecture 10: Analog Storage Content Overview Volatile Short Term Storage

Automotive Regulations &amp; Certification Processes A global manufacturers perspective Dennis

Virginia Department of Education, University of Virginias Curry School of Education 1 3

Securing Our Future Committee Meeting 3: Wednesday, June 27 Joan Y. Ervin Elementary School

TKN Q1/19 OPPORTUNITY DAY Quarter 4 /18 Taokaenoi Food &amp; Marketing Public Company Limited

KLOUD KITCHEN P R I VAT E &amp; C O N F I D E N T I A L INSPIRATION Looking at the competition

SEA and MM Diner Trends 13 14 14 For Myanmar: Rise of sit-in Bakeries in MM, which provide

Atlanta Memorial Park Conservancy (formally The Bobby Jones Golf Course and Park Conservancy) 1

Meeting #9 July 11, 2018 LYNX Central Station, 2 nd Floor Open Area 1 Schedule What are the

> SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE

Hybrid SAN & Cluster Enterprise Network Storage Hikvision Enterprise Network Storage

Automotive Regulations & Certification Processes A global manufacturers perspective Dennis

TKN Q1/19 OPPORTUNITY DAY Quarter 4 /18 Taokaenoi Food & Marketing Public Company Limited

KLOUD KITCHEN P R I VAT E & C O N F I D E N T I A L INSPIRATION Looking at the competition