Performance Analysis of Commodity and Enterprise Class Flash Devices - - PowerPoint PPT Presentation

performance analysis of commodity and enterprise class
SMART_READER_LITE
LIVE PREVIEW

Performance Analysis of Commodity and Enterprise Class Flash Devices - - PowerPoint PPT Presentation

Performance Analysis of Commodity and Enterprise Class Flash Devices Neal M. Master, Matthew Andrews, Jason Hick, Shane Canon & Nicholas J. Wright 1 Data Trends at NERSC 2 Data Trends at NERSC cont. 3 Memory Capacity Trends


slide-1
SLIDE 1

1

Performance Analysis of Commodity and Enterprise Class Flash Devices Neal M. Master, Matthew Andrews, Jason Hick, Shane Canon & Nicholas J. Wright

slide-2
SLIDE 2

2

Data Trends at NERSC

slide-3
SLIDE 3

3

Data Trends at NERSC cont.

slide-4
SLIDE 4

4

Memory Capacity Trends

  • Technology trends:

– Memory density 2X every 3 yrs; processor logic every 2 – Storage costs ($/MB) drops more gradually than logic costs

Source: David Turek, IBM

Cost of Computation vs. Memory

Source: IBM

slide-5
SLIDE 5

5

I/O Performance Challenges

Performance Crisis #1

  • Disks are outpaced by compute speed.
  • To achieve reasonable aggregate

bandwidth many spindles needed – 103 spindles = 1PB but only ~0.1 TB/s !

1 10 100 1000 10000 DP FLOP Register 1mm on-chip 5mm on-chip Off-chip/DRAM local Cross system

PicoJoules

now 2018 Performance Crisis #2 Data Motion on an Exascale Machine will be expensive – both in terms of energy & performance !

slide-6
SLIDE 6

6

Flash Memory - Ubiquitous

slide-7
SLIDE 7

7

Flash – What is it good for?

  • Fits nicely into latency gap between

spinning disk and memory

  • Lots of open Q’s:

– PCI vs SATA vs ? – SLC vs MLC – Write requires block erase - performance dependent upon previous IO pattern – Correct algorithm in software at all levels – ….

slide-8
SLIDE 8

8

Devices Evaluated

  • 3 PCI-e SLC

– Virident tachIOn 400GB 8x – FusionIO ioDrive Duo 2x 160GB 4x – Texas Memory Systems RamSan-20 450GB 4x

  • 2 SATA MLC

– Intel X-25M 160GB – OCZ Colossus 250GB

slide-9
SLIDE 9

9

IOZone Experiments

  • Bandwidth

– Vary block size: 2n KB, n =2-8 – Vary concurrency: 2n threads, n=0-7 (1-128) – Vary IO Patterns: Sequential Write/Re-write, Sequential Read/Re-read, Random Write, Mixed Random Write/Read, Random Read

  • IOPS

– 4KB block size – Vary concurrency: 2n threads, n=0-7 (1-128)

slide-10
SLIDE 10

10

SATA Bandwidths

4 16 64 256 50 100 150 200 1 2 4 8 16 32 64 128 IO Block Size (KB) Bandwidth (MB/s) Number of Threads 0-50 50-100 100-150 150-200

INTEL X25-M READ

slide-11
SLIDE 11

11

PCI-e Bandwidths

4 16 64 256 100 200 300 400 500 600 700 800 1 2 4 8 16 32 64 128 IO Block Size (KB) Bandwidth (MB/s) Number of Threads 0-100 100-200 200-300 300-400 400-500 500-600 600-700 700-800

TMS RAMSAN READ

slide-12
SLIDE 12

12

PCI-Bandwidths continued

4 16 64 256 100 200 300 400 500 600 700 800 900 1000 1100 1200 1 2 4 8 16 32 64 128 IO Block Size (KB) Bandwidth (MB/s) Number of Threads 0-100 100-200 200-300 300-400 400-500 500-600 600-700 700-800 800-900 900-1000 1000-1100 1100-1200

Virident tachIOn READ

slide-13
SLIDE 13

13

Bandwidth Summary

200 400 600 800 1000 1200 1400 TMS RamSan 20 (450GB) Virident tachIOn (400GB) Fusion IO ioDrive Duo (Single Slot, 160GB) Intel X-25M (160GB) OCZ Colossus (250 GB) Bandwidth (MB/s) Read Bandwidth Company Reported Read Bandwidth Write Bandwidth Company Reported Write Bandwidth

slide-14
SLIDE 14

14

IOPS - READ

20 40 60 80 100 120 140 160 180 1 2 4 8 16 32 64 128 IO/s (thousands) Number of Threads Virident tachIOn (400GB) TMS RamSan 20 (450GB) Fusion IO ioDrive Duo (Single Slot, 160GB) Intel X-25M (160GB) OCZ Colossus (250 GB)

slide-15
SLIDE 15

15

IOPS - Write

20 40 60 80 100 120 140 160 180 1 2 4 8 16 32 64 128 IO/s (thousands) Number of Threads Virident tachIOn (400GB) TMS RamSan 20 (450GB) Fusion IO ioDrive Duo (Single Slot, 160GB) Intel X-25M (160GB) OCZ Colossus (250 GB)

slide-16
SLIDE 16

16

Flash Device Evaluation - IOPS

20 40 60 80 100 120 140 160 180 TMS RamSan 20 (450GB) Virident tachIOn (400GB) Fusion IO ioDrive Duo (Single Slot, 160GB) Intel X-25M (160GB) OCZ Colossus (250 GB) Thousands IOPs Peak Read Peak Write

slide-17
SLIDE 17

17

Degradation Experiment

  • Create a file using

– Cat /dev/urandom | dd – that fills X% of the drive X=30,50,70,90

  • Using FIO randomly write to the file

– Using 4KB blocks - IOPS – Using 128KB blocks - BW

slide-18
SLIDE 18

18

Degradation - IOPS

5 10 15 20 25 30 35 10 20 30 40 50 60 IO/s (thousands) Minutes Virident tachIOn (400GB) TMS RamSan 20 (450GB) Fusion IO ioDrive Duo (Single Slot, 160GB) Intel X-25M (160GB) OCZ Colossus (250 GB)

slide-19
SLIDE 19

19

Degradation – IOPS Summary

0% 20% 40% 60% 80% 100% 120% Virident tachIOn (400GB) TMS RamSan 20 (450GB) Fusion IO ioDrive Duo (Single Slot, 160GB) Intel X-25M (160GB) OCZ Colossus (250 GB) Percentage of Peak Write IO/s 30% 50% 70% 90%

slide-20
SLIDE 20

20

Degradation - Bandwidth

200 400 600 800 1000 1200 10 20 30 40 50 MB/s Minutes Virident tachIOn (400GB) TMS RamSan 20 (450GB) Fusion IO ioDrive Duo (Single Slot, 160GB) Intel X-25M (160GB) OCZ Colossus (250 GB)

slide-21
SLIDE 21

21

Degradation BW Summary

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% Virident tachIOn (400GB) TMS RamSan 20 (450GB) Fusion IO ioDrive Duo (Single Slot, 160GB) Intel X-25M (160GB) OCZ Colossus (250 GB) Percentage of Peak Write Bandwidth 30% Capacity 50% Capacity 70% Capacity 90% Capacity

slide-22
SLIDE 22

22

Summary

  • PCI devices are much more capable

than the SATA ones

  • For PCI read ~ write for both sequential

I/O and IOPS

  • It is important to test for your workload

each device

  • The PCI devices especially can be

difficult to use……

slide-23
SLIDE 23

23

Future Work

  • Testing Flash with Hadoop
  • Evaluating various new storage
  • technologies. PCM etc
  • Explore other uses for flash

– Metadata storage

slide-24
SLIDE 24

24

Combining Flash with Hadoop

slide-25
SLIDE 25

25