performance analysis of commodity and enterprise class
play

Performance Analysis of Commodity and Enterprise Class Flash Devices - PowerPoint PPT Presentation

Performance Analysis of Commodity and Enterprise Class Flash Devices Neal M. Master, Matthew Andrews, Jason Hick, Shane Canon & Nicholas J. Wright 1 Data Trends at NERSC 2 Data Trends at NERSC cont. 3 Memory Capacity Trends


  1. Performance Analysis of Commodity and Enterprise Class Flash Devices Neal M. Master, Matthew Andrews, Jason Hick, Shane Canon & Nicholas J. Wright 1

  2. Data Trends at NERSC 2

  3. Data Trends at NERSC cont. 3

  4. Memory Capacity Trends • Technology trends: – Memory density 2X every 3 yrs; processor logic every 2 – Storage costs ($/MB) drops more gradually than logic costs Cost of Computation vs. Memory Source: David Turek, IBM Source: IBM 4

  5. I/O Performance Challenges Performance Crisis #1 • Disks are outpaced by compute speed. • To achieve reasonable aggregate bandwidth many spindles needed – 10 3 spindles = 1PB but only ~0.1 TB/s ! now 2018 10000 PicoJoules 1000 100 Performance Crisis #2 10 Data Motion on an Exascale 1 Machine DP FLOP Register 1mm on-chip 5mm on-chip Off-chip/DRAM Cross system will be expensive – both in local terms of energy & performance ! 5

  6. Flash Memory - Ubiquitous 6

  7. Flash – What is it good for? • Fits nicely into latency gap between spinning disk and memory • Lots of open Q’s: – PCI vs SATA vs ? – SLC vs MLC – Write requires block erase - performance dependent upon previous IO pattern – Correct algorithm in software at all levels – …. 7

  8. Devices Evaluated • 3 PCI-e SLC – Virident tachIOn 400GB 8x – FusionIO ioDrive Duo 2x 160GB 4x – Texas Memory Systems RamSan-20 450GB 4x • 2 SATA MLC – Intel X-25M 160GB – OCZ Colossus 250GB 8

  9. IOZone Experiments • Bandwidth – Vary block size: 2 n KB, n =2-8 – Vary concurrency: 2 n threads, n=0-7 (1-128) – Vary IO Patterns: Sequential Write/Re-write, Sequential Read/Re-read, Random Write, Mixed Random Write/Read, Random Read • IOPS – 4KB block size – Vary concurrency: 2 n threads, n=0-7 (1-128) 9

  10. SATA Bandwidths 0-50 50-100 100-150 150-200 INTEL X25-M READ 200 Bandwidth (MB/s) 150 100 50 256 64 IO Block Size 0 (KB) 16 1 2 4 8 4 16 32 64 128 Number of Threads 10

  11. PCI-e Bandwidths 0-100 100-200 200-300 300-400 400-500 500-600 600-700 700-800 TMS RAMSAN READ 800 700 Bandwidth (MB/s) 600 500 400 300 200 256 100 64 IO Block Size (KB) 0 16 1 2 4 8 4 16 32 64 128 Number of Threads 11

  12. PCI-Bandwidths continued 0-100 100-200 200-300 300-400 400-500 500-600 600-700 700-800 800-900 900-1000 1000-1100 1100-1200 Virident tachIOn READ 1200 1100 1000 Bandwidth (MB/s) 900 800 700 600 500 400 300 256 200 64 100 IO Block Size 0 (KB) 16 1 2 4 8 4 16 32 64 128 Number of Threads 12

  13. Bandwidth Summary Read Bandwidth Company Reported Read Bandwidth Write Bandwidth Company Reported Write Bandwidth 1400 1200 1000 Bandwidth (MB/s) 800 600 400 200 0 TMS RamSan 20 Virident tachIOn Fusion IO ioDrive Intel X-25M OCZ Colossus (450GB) (400GB) Duo (Single Slot, (160GB) (250 GB) 160GB) 13

  14. IOPS - READ Virident tachIOn (400GB) TMS RamSan 20 (450GB) Fusion IO ioDrive Duo (Single Slot, 160GB) Intel X-25M (160GB) OCZ Colossus (250 GB) 180 160 140 IO/s (thousands) 120 100 80 60 40 20 0 1 2 4 8 16 32 64 128 Number of Threads 14

  15. IOPS - Write Virident tachIOn (400GB) TMS RamSan 20 (450GB) Fusion IO ioDrive Duo (Single Slot, 160GB) Intel X-25M (160GB) OCZ Colossus (250 GB) 180 160 140 IO/s (thousands) 120 100 80 60 40 20 0 1 2 4 8 16 32 64 128 Number of Threads 15

  16. Flash Device Evaluation - IOPS Peak Read Peak Write 180 160 140 Thousands IOPs 120 100 80 60 40 20 0 TMS RamSan 20 Virident tachIOn Fusion IO ioDrive Intel X-25M OCZ Colossus (450GB) (400GB) Duo (Single Slot, (160GB) (250 GB) 160GB) 16

  17. Degradation Experiment • Create a file using – Cat /dev/urandom | dd – that fills X% of the drive X=30,50,70,90 • Using FIO randomly write to the file – Using 4KB blocks - IOPS – Using 128KB blocks - BW 17

  18. Degradation - IOPS Virident tachIOn (400GB) TMS RamSan 20 (450GB) Fusion IO ioDrive Duo (Single Slot, 160GB) Intel X-25M (160GB) OCZ Colossus (250 GB) 35 30 25 IO/s (thousands) 20 15 10 5 0 0 10 20 30 40 50 60 Minutes 18

  19. Degradation – IOPS Summary 30% 50% 70% 90% 120% Percentage of Peak Write IO/s 100% 80% 60% 40% 20% 0% Virident tachIOn TMS RamSan Fusion IO Intel X-25M OCZ Colossus (400GB) 20 (450GB) ioDrive Duo (160GB) (250 GB) (Single Slot, 160GB) 19

  20. Degradation - Bandwidth Virident tachIOn (400GB) TMS RamSan 20 (450GB) Fusion IO ioDrive Duo (Single Slot, 160GB) Intel X-25M (160GB) OCZ Colossus (250 GB) 1200 1000 800 MB/s 600 400 200 0 0 10 20 30 40 50 Minutes 20

  21. Degradation BW Summary 30% Capacity 50% Capacity 70% Capacity 90% Capacity 90% Percentage of Peak Write Bandwidth 80% 70% 60% 50% 40% 30% 20% 10% 0% Virident tachIOn TMS RamSan 20 Fusion IO ioDrive Intel X-25M OCZ Colossus (400GB) (450GB) Duo (Single Slot, (160GB) (250 GB) 160GB) 21

  22. Summary • PCI devices are much more capable than the SATA ones • For PCI read ~ write for both sequential I/O and IOPS • It is important to test for your workload each device • The PCI devices especially can be difficult to use…… 22

  23. Future Work • Testing Flash with Hadoop • Evaluating various new storage technologies. PCM etc • Explore other uses for flash – Metadata storage 23

  24. Combining Flash with Hadoop 24

  25. 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend