server server server server server datacenter network e g
play

Server Server Server Server Server Datacenter Network - PowerPoint PPT Presentation

Chanwoo Chung , Jinhyung Koo, Junsu Im, Arvind , and Sungjin Lee DGIST and MIT NVRAMOS 19 2019.10.24 DATA -INTENSIVE COMPUTING SYSTEMS LAB ORATORY Computation Application Application Application Application Application


  1. Chanwoo Chung ǂ , Jinhyung Koo, Junsu Im, Arvind ǂ , and Sungjin Lee DGIST and MIT ǂ NVRAMOS ‘19 2019.10.24 DATA -INTENSIVE COMPUTING SYSTEMS LAB ORATORY

  2. Computation Application Application Application Application Application … … Server Server Server Server Server … Datacenter Network (e.g., Ethernet, InfiniBand, …) … Storage Xeon … GB Disk Array CPUs w/ RAID DRAM Storage Node 0 Storage Node 1 Storage Node N It is not mere storage – it is another high-end server !!! High-end Xeon CPUs Power Hungry (e.g., 1700 W) Several GBs of DRAM Expensive (e.g., $2~40,000 w/o SSDs) An array of SSDs Large Volume (e.g., 2-4 U) Large form-factor High TCO (e.g., Cooling) … … 2

  3. ▪ HDD is slow – require large DRAM and array of disks ▪ 10 ms latency & 100~300 MB/s throughput ▪ HDD is dumb – the host system makes it smarter ▪ Xeon CPUs with advanced algorithms Aggr. Network Throughput = 20 GB/s 40GbE 40GbE 40GbE 40GbE Host Protocol Translation (e.g., NFS, CIFS, …) Storage Host … Caching/Buffering Parity Mgmt Prefetching Dedup/Compresion Local File System (e.g., EXT4, WAFL, …) Xeon GB Disk Array 300 MB/s 300 MB/s CPUs w/ RAID DRAM … HDD HDD HDD HDD HDD HDD HDD HDD 3

  4. ▪ HDD is slow – require large DRAM and array of disks ▪ 10 ms latency & 100~300 MB/s throughput SSDs are not a bottleneck → Network/CPU are new bottlenecks ▪ HDD is dumb – the host system makes it smarter ▪ Xeon CPUs with advanced algorithms Bottleneck!!! Aggr. Network Throughput = 20 GB/s 40GbE 40GbE 40GbE 40GbE Host Protocol Translation (e.g., NFS, CIFS, …) Storage Host … Caching/Buffering Parity Mgmt Prefetching Dedup/Compresion Local File System (e.g., EXT4, WAFL, …) Xeon GB SSD Array 1~10 GB/s 1~10 GB/s CPUs w/ RAID DRAM … Aggr. SDD Throughput = 10~100 GB/s (with 10 SSDs) SSD SSD SSD SSD SSD SSD SSD SSD 3

  5. EMC NetApp HPE Hynix XtremIO SolidFire 3PAR AFA Capacity 36~144TB 46TB 750TB 522TB # of SSDs 18~72 12 120 576 SSD Array Aggr. 18~72 GB/s 12 GB/s 120 GB/s 576 GB/s Throughput* 4~8x 2x 4~12x 3x Ports 10Gb iSCSI 25Gb iSCSI 16Gb FC Gen3 PCIe Network Aggr. 5~10 GB/s 6.25 GB/s 8~24 GB/s 48 GB/s Throughput ※ Aggr. SSD throughput was estimated assuming each SSD offers 1GB/s throughput ▪ Supported by the latest works ▪ K. Kourtis et al., “Reaping the performance of fast NVM storage with uDepot ,” USENIX FAST ‘19 ▪ J. Kim et al., “Alleviating Garbage Collection Interference through Spatial Separation in All Flash Arrays,” USENIX ATC ‘19 4

  6. ▪ Supported by the latest works ▪ K. Kourtis et al., “Reaping the performance of fast NVM storage with uDepot ,” USENIX FAST ‘19 ▪ J. Kim et al., “Alleviating Garbage Collection Interference through Spatial Separation in All Flash Arrays,” USENIX ATC ‘19 4

  7. ▪ HDD is slow – require large DRAM and array of disks ▪ 10 ms latency & 100~300 MB/s throughput SSDs are not a bottleneck → Network/CPU are new bottlenecks ▪ HDD is dumb – the host system makes it smarter ▪ Xeon CPUs with advanced algorithms SSDs are smart enough, supporting many features → Duplicate storage management hurts performance Bottleneck!!! Aggr. Network Throughput = 20 GB/s 40GbE 40GbE 40GbE 40GbE Host Protocol Translation (e.g., NFS, CIFS, …) Storage Host … Caching/Buffering Parity Mgmt Prefetching Dedup/Compresion Local File System (e.g., EXT4, WAFL, …) Xeon GB SSD Array 1~10 GB/s 1~10 GB/s CPUs w/ RAID DRAM Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl … SSD SSD SSD SSD SSD SSD SSD SSD 5

  8. ▪ 4 embedded CPUs (ARM) running at 700 MHz to 1.4 GHz and > 1~16GB DRAM that a desktop PC had 10 years ago ▪ Those resources are required for running firmware (i.e., FTL) PCIe Interface (1~10 GB/s) Host-to-PCIe Controller ARM CPU ARM CPU Block I/O-to-Flash I/O Interfacing (Max 1.4 GHz) (Max 1.4 GHz) DRAM Remapping Wear-Leveling Cleaning (>4 GB) ARM CPU ARM CPU Parity Mgmt. Deduplication Compression (Max 1.4 GHz) (Max 1.4 GHz) RAID … NAND NAND NAND NAND NAND NAND NAND NAND CHIP CHIP CHIP CHIP CHIP CHIP CHIP CHIP 6

  9. Computation Application Application Application Application Application … … Server Server Server Server Server … Datacenter Network (e.g., Ethernet, InfiniBand, …) … Storage Xeon … GB Disk Array CPUs w/ RAID DRAM Storage Node 0 Storage Node 1 Storage Node N Let’s assume that this storage node has 8TB 72 SSDs (EMC XtremIO) ▪ # of ARM cores: 4 cores x 72 = 288 ARM cores ▪ Aggregate DRAM: 8 GB x 72 = 576 GB DRAM Just for managing NAND flash Q: Is this a storage node or a low-power microserver? 7

  10. ▪ Use simple SSD? ▪ Software Defined Flash (ASPLOS ’14) ▪ Application- managed Flash (USENIX FAST ’16) ▪ LightNVM (USENIX FAST ’17) → Network/CPU are still bottleneck ▪ Use better SSD organization? ▪ SWAN (HotStorage ’16; USENIX ATC ‘19) → Still rely on power-hungry and expensive host ▪ Any other solution? 8

  11. ▪ Motivation ▪ Basic Idea ▪ LightStore Software ▪ LightStore Controller ▪ LightStore Adapters ▪ Experimental Results ▪ Conclusion 9

  12. ▪ Get rid of a space-consuming, expensive, power-hungry host server ▪ Put and run everything in SSDs ▪ Attach SSDs to a datacenter network ▪ Let application servers directly talk to SSDs … Application Application Application Server Server Server Datacenter Network Host Protocol Translation (e.g., NFS, CIFS, …) … Parity Mgmt Prefetching Caching/Buffering Local File System (e.g., EXT4, WAFL, …) Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl … SSD SSD SSD SSD SSD SSD SSD 10

  13. ▪ Get rid of a space-consuming, expensive, power-hungry host server ▪ Put and run everything in SSDs ▪ Attach SSDs to a datacenter network ▪ Let application servers directly talk to SSDs … Application Application Application Server Server Server Datacenter Network Host Protocol Translation (e.g., NFS, CIFS, …) … Parity Mgmt Prefetching Caching/Buffering Local File System (e.g., EXT4, WAFL, …) Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl … SSD SSD SSD SSD SSD SSD SSD 10

  14. ▪ Get rid of a space-consuming, expensive, power-hungry host server ▪ Put and run everything in SSDs ▪ Attach SSDs to a datacenter network ▪ Let application servers directly talk to SSDs … Application Application Application Server Server Server Datacenter Network Host-to-PCIe Controller Host Protocol Translation DRAM High-level Flash Management (2~4 GB) Low-level Flash Management RAID Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl … … SSD SSD SSD SSD SSD SSD SSD NAND NAND NAND NAND NAND NAND NAND NAND 10

  15. ▪ Get rid of a space-consuming, expensive, power-hungry host server ▪ Put and run everything in SSDs ▪ Attach SSDs to a datacenter network ▪ Let application servers directly talk to SSDs … Application Application Application Server Server Server Datacenter Network Host-to-PCIe Controller Ethernet Controller Host Protocol Translation DRAM High-level Flash Management (2~4 GB) Low-level Flash Management RAID Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl … … SSD SSD SSD SSD SSD SSD SSD NAND NAND NAND NAND NAND NAND NAND NAND 10

  16. ▪ Get rid of a space-consuming, expensive, power-hungry host server ▪ Put and run everything in SSDs ▪ Attach SSDs to a datacenter network ▪ Let application servers directly talk to SSDs … Application Application Application Server Server Server Datacenter Network Host-to-PCIe Controller Ethernet Controller Host Protocol Translation DRAM High-level Flash Management (2~4 GB) Low-level Flash Management RAID Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl … … SSD SSD SSD SSD SSD SSD SSD NAND NAND NAND NAND NAND NAND NAND NAND Deliver Flash’s low latency & high throughput to network ports! 10

  17. ▪ Get rid of a space-consuming, expensive, power-hungry host server ▪ Put and run everything in SSDs ▪ Attach SSDs to a datacenter network ▪ Let application servers directly talk to SSDs … Application Application Application Server Server Server Datacenter Network An x86 storage server with N SSDs is replaced with N SSDs Low Power (e.g., 100 W / 10 SSDs) Cheap (e.g., Zero server cost) Small Volume (e.g., Less than 1U) Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl … Low TCO (e.g., Less Cooling) SSD SSD SSD SSD SSD SSD SSD Scalability (No network bottleneck) 10

  18. ▪ Can we run complicated server software on wimpy ARM cores? ▪ How can we provide the same interface with application servers? ▪ How can we manage unreliable NAND without more ARM cores? 11

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend