A Scalable System Design for Data Reduction in Modern Storage - PowerPoint PPT Presentation

A Scalable System Design for Data Reduction in Modern Storage Servers Mohammadamin Ajdari Presentation at Dpt. Of Computer Engineering, Sharif Univ. of Tech. 2020/1/22

My Education Direct PhD in Computer Eng. Degree from POSTECH (South Korea) [2013 - 2019] BSc in Electrical Eng. (Electronics) Degree from Sharif Univ. of Tech. (Iran) [2008-2013] 2

Long-Term Research/Engineering Projects • Scalable data reduction architecture (main author) − CAL’17 , HPCA’19 (Best Paper Nominee), MICRO’19 − IEEE MICRO Top Pick’19 ( Honorable Mention ) PhD • Device centric server architecture (co-author) − MICRO’15 , ISCA’18 • CPU performance modeling (co-author) − TACO’18 • Design of a real computer system from scratch (main author) BSc − ICL’12, IJSTE’16 (Best BSc Project Award) 3 /

Long-Term Research/Engineering Projects • Scalable data reduction architecture (main author) − CAL’17 , HPCA’19 (Best Paper Nominee), MICRO’19 • Device centric server architecture (co-author) PhD − MICRO’15 , ISCA’18 • CPU performance modeling (co-author) − TACO’18 • Design of a real computer system from scratch (main author) BSc − ICL’12, IJSTE’16 (Best BSc Project Award) 4 /

Long-Term Research/Engineering Projects • Scalable data reduction architecture (main author) − CAL’17 , HPCA’19 (Best Paper Nominee), MICRO’19 • Device centric server architecture (co-author) − MICRO’15 , ISCA’18 PhD • CPU performance modeling (co-author) − TACO’18 • Design of a real computer system from scratch (main author) BSc * JE Jo, GH Lee, H Jang, J Lee, M Ajdari , J Kim, “ DiagSim : Systematically Diagnosing Simulators for Healthy Simulations” , TACO 2018 − ICL’12, IJSTE’16 (Best BSc Project Award) 5 /

Long-Term Research/Engineering Projects • Scalable data reduction architecture (main author) − CAL’17 , HPCA’19 (Best Paper Nominee), MICRO’19 • Device centric server architecture (co-author) PhD − MICRO’15 , ISCA’18 • CPU performance modeling (co-author) − TACO’18 • Design of a real computer system from scratch (main author) BSc * J Ahn, D Kwon, Y Kim, M Ajdari , J Lee, J Kim, “ DCS: A fast and scalable device- centric server architecture” , MICRO 2015 − ICL’12, IJSTE’16 (Best BSc Project Award) ** D Kwon, J Ahn, D Chae, M Ajdari , J Lee, S Bae, Y Kim, J Kim, “ DCS-ctrl: A fast and flexible device-control mechanism for device- centric server architecture” , 6 / ISCA 2018

Long-Term Research/Engineering Projects • Scalable data reduction architecture (main author) − CAL’17 , HPCA’19 (Best Paper Nominee), MICRO’19 − IEEE MICRO Top Pick’19 ( Honorable Mention ) PhD • Device centric server architecture (co-author) − MICRO’15 , ISCA’18 • CPU performance modeling (co-author) − TACO’18 • Design of a real computer system from scratch (main author) BSc * M Ajdari , P Park, D Kwon, J Kim, J Kim, “ A scalable HW- based inline deduplication for SSD arrays ” , IEEE CAL 2017 − ICL’12, IJSTE’16 (Best BSc Project Award) ** M Ajdari , P Park, J Kim, D Kwon, J Kim, “ CIDR: A cost-effective in-line data reduction system for terabit-per- second scale SSD arrays” , HPCA 2019 7 / *** M Ajdari , W Lee, P Park, J Kim, J Kim, “ FIDR: A scalable storage system for fine- grain inline data reduction with efficient memory handling” , MICRO 2019

Index • Background − Storage Systems and Trends − Basics of Data Reduction Techniques • Proposing New Data Reduction Architecture − Deduplication for slow SSD Arrays − Deduplication and Compression for fast SSD Arrays − Optimizing for Ultra-scalability & more Workload Support • Conclusion 8 /

Data Storage is Very Important 40 ZB 40 ZB Source: IDC DataAge 2025 whitepaper Annual Data size … 2 TB 2012 2014 2016 2018 2020 2010 9 / Year

Storage System Types ➢ Depends on type of HDD/SSD connection to a server 2 1 Indirectly attached Directly attached over a switched network to the server motherboard 10 /

Storage System #1: Direct-Attached ➢ Direct Attached Storage (DAS) ▪ Attach storage device (e.g., HDD) directly to the server ➢ Benefits ▪ Simple implementation ▪ Each server has fast access to Its local storage ➢ Problems ▪ Storage & computation resources cannot scale independently ▪ Slow data sharing across nodes 11 /

Storage System #2: Network Attached ➢ Storage over a switched network ▪ Storage system is almost a separate server on network (e.g., NAS) ➢ Benefits ▪ Independent storage scalability ▪ High reliability ▪ Fast data sharing across nodes ▪ Problems ▪ Complex implementation In this talk, this is our choice of storage system 12 /

Storage Device Trend SSD HDD Capacity : 2TB- 8 TB 1 TB - 32 TB Throughput: 200 MB/s 2 GB/s - 6.8 GB/s Latency : over 1 ms Over 20 µs Fast, high capacity SSDs are replacing HDDs 13 /

But Modern Storage is Very Expensive • Average SSD Price Compared to HDD − 3x-5x higher cost (MLC SSD vs. HDD) • Limited lifetime of SSD flash cells − Max 5K-10K writes (per cell) $$$ $$$ Source: IDC DataAge Annual Data size Capacity & Throughput 2025 whitepaper $$$ Cost (e.g., est. 50 SSDs with 800 GB/s, 500 TB Cap . [SmartIOPS Appliance]) # of SSDs 14 / 2012 2014 2016 2018 2020 2010

But Modern Storage is Very Expensive • Average SSD Price Compared to HDD − 3x-5x higher cost (MLC SSD vs. HDD) • Limited lifetime of SSD flash cells − Max 5K-10K writes (per cell) $$$ $$$ Source: IDC DataAge Annual Data size Capacity & Throughput 2025 whitepaper $$$ Cost (e.g., est. 50 SSDs with 800 GB/s, 500 TB Cap . [SmartIOPS Appliance]) # of SSDs 15 / 2012 2014 2016 2018 2020 2010

Data Reduction Overview Client data chunks Deduplication Client data (e.g., DB, VM Non-duplicate (Unique) chunks Image) Compression Compressed unique chunks … SSD array SSD SSD SSD SSD SSD Deduplication + Compression → 60%-90% data reduction 16 /

Data Deduplication Basic Flow ➢ Unique data write Mapping Tables SSDs Hash PBA 0x AABB 200 0x9D12 Hash 0x95CD 150 Data Search 0x67CA 1100 LBA PBA 5004 100 200 Logical Block Address 101 200 (LBA) 17 /

Data Deduplication Basic Flow ➢ Unique data write Mapping Tables SSDs Hash PBA 0x AABB 200 0x9D12 Hash 0x95CD 150 Data Search 0x67CA 1100 0x9D12 1101 LBA PBA 5004 Data PBA=1101 100 200 Logical Block Address Update LBA/PBA 101 200 (LBA) 5004 1101 18 /

Data Deduplication Basic Flow ➢ Duplicate data write Mapping Tables SSDs Hash PBA 0x AABB 200 0x9D12 Hash 0x95CD 150 Data Search 0x67CA 1100 0x9D12 1101 LBA PBA 5010 100 200 Logical Block Address 101 200 (LBA) 5004 1101 19 /

Data Deduplication Basic Flow ➢ Duplicate data write Mapping Tables SSDs Hash PBA 0x AABB 200 0x9D12 Hash 0x95CD 150 Data Search 0x67CA 1100 0x9D12 1101 LBA PBA 5010 100 200 Logical Block Address 101 200 (LBA) 5004 1101 20 /

Data Deduplication Basic Flow ➢ Duplicate data write Mapping Tables SSDs Hash PBA 0x AABB 200 0x9D12 Hash 0x95CD 150 Data Search 0x67CA 1100 0x9D12 1101 LBA PBA 5010 100 200 Logical Block Address Update LBA/PBA 101 200 (LBA) 5004 1101 No data write 5010 1101 21 /

Data Reduction Main Parameters ▪ Many parameters & design choices ▪ Granularity, hashing type, mapping table type, compression type, where/when to apply, dedup-compression or compression-dedup, how to reclaim unused spaces, … ▪ Various trade-offs ▪ data reduction effectiveness, system resource utilization, latency, throughput, power consumption, … Next few slides = 4 major parameters discussed 22 /

Parameter #1: Chunking Type Fixed sized Variable sized Data Dup Dup Dup Dup Dup Dup Dup Dup Dup Dup + Simple, easy to organize + sometimes detects more duplicates Pros/ Cons - sensitive to data alignment - Compute-intensive and complex - PureStorage servers - Solidfire servers Commercial Usage - Microsoft Clouds [ATC’12] - HPE 3PAR servers 23 /

Parameter #2: Chunking Granularity Small Chunks (1KB..8KB) Large Chunks (64KB..4MB) Data + Lightweight mapping tables Pros/ + High duplicate detection Cons - Less duplicates & RMW overheads - Heavy-weight mapping tables - Solidfire servers (4 KB) Commercial - Some Microsoft Clouds (64 KB) Usage - HPE 3PAR servers (16 KB) 24 /

Parameter #3: Hashing Algorithm Weak Hash (e.g., CRC) Strong Hash (e.g., SHA2) data1 data1 0xAAAA Hash 0xAAAA Hash No hash collision = = Hash collision ≠ = data2 data2 0xAAAA Hash 0xAAAA Hash + Fast calculation + No practical hash collision in PBs Pros/ Cons - Hash collision =data loss! (needs - Compute-intensive bit-by-bit data comparison) - Solidfire (SHA2 hash) Commercial - PureStorage servers Usage - Microsoft clouds (SHA1 hash) 25 /

Parameter #4: When to Do Data Reduction Offline Operation Inline Operation Dedup/ Client data Dedup/ Client data Compr Compr HDD/SSD HDD/SSD HDD/SSD Active time Idle time Active time + Improves SSD lifetime + No impact on active IOs Pros/ +No idle time required - Requires idle time Cons - Reduces SSD lifetime - Requires dedicated resources (CPU,…) Commercial - HDD-based systems - Most SSD-based systems Usage 26 /

A Scalable System Design for Data Reduction in Modern Storage - PowerPoint PPT Presentation

A Scalable System Design for Data Reduction in Modern Storage Servers Mohammadamin Ajdari Presentation at Dpt. Of Computer Engineering, Sharif Univ. of Tech. 2020/1/22 My Education Direct PhD in Computer Eng. Degree from POSTECH (South Korea)

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

Scalable String Matching on the Scalable String Matching on the Scalable String Matching on the

SAS data reduction Haydyn Mertens (EMBL-Hamburg) Data reduction steps Acquisition Reduction

Introduction to Harm Reduction Definition of Harm Reduction Harm reduction refers to policies,

Scalable Distributed Lineage Authentication Ashish Gehani Scalable Distributed Lineage

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors Austin T.

Dyninst Scalable Tools Workshop Granlibakken Resort Lake Tahoe, California Dyninst Scalable

Scalable Learning Technologies Scalable Learning Technologies for Big Data Mining for Big Data

ICU Restraint Reduction: ICU Restraint Reduction: ICU Restraint Reduction: Development of

Lattice Basis Reduction Part II: Algorithms Sanzheng Qiao Department of Computing and Software

Treewidth reduction and algorithmic applications Treewidth reduction and algorithmic applications

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 3: Parallel and Scalable Data

Zarr - scalable storage of tensor Zarr - scalable storage of tensor data for parallel and

Ruijsenaars-Schneider system from reduction Quasi- quasi-Hamiltonian reduction Hamiltonian

Architectural Support for Parallel Reduction in Scalable Shared Memory Multiprocessors in

and tacos! Who? Originally from Toronto, Canada Lives in Tallinn, Estonia Drupal since 2008

Temporal Common Sense n Humans assume information when reading Not explicitly mentioned

Linearizability & CAP Announcements No hours this week. Announcements No hours this

An industrial case study of TACO Benjamin Lesage , Stephen Law, Iain Bate Icons courtesy of

MetaPost 1.207 (TEXLive 2009) EuroTEX 2009 SVG backend SVG backend SVG backend SVG backend A

On the approximate cohomology of quasi holomorphic line bundles Jean-Pierre Demailly Institut

Style Transfer from Non-Parallel Text by Cross-Alignment Shen et al 2017 Arxiv: 1705.09655

Fair CPU Time Accounting in CMP+SMT Processors Carlos Luque (UPC/BSC) Miquel Moreto

A Scalable System Design for Data Reduction in Modern Storage - PowerPoint PPT Presentation

A Scalable System Design for Data Reduction in Modern Storage Servers Mohammadamin Ajdari Presentation at Dpt. Of Computer Engineering, Sharif Univ. of Tech. 2020/1/22 My Education Direct PhD in Computer Eng. Degree from POSTECH (South Korea)

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

Scalable String Matching on the Scalable String Matching on the Scalable String Matching on the

SAS data reduction Haydyn Mertens (EMBL-Hamburg) Data reduction steps Acquisition Reduction

Introduction to Harm Reduction Definition of Harm Reduction Harm reduction refers to policies,

Scalable Distributed Lineage Authentication Ashish Gehani Scalable Distributed Lineage

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors Austin T.

Dyninst Scalable Tools Workshop Granlibakken Resort Lake Tahoe, California Dyninst Scalable

Scalable Learning Technologies Scalable Learning Technologies for Big Data Mining for Big Data

ICU Restraint Reduction: ICU Restraint Reduction: ICU Restraint Reduction: Development of

Lattice Basis Reduction Part II: Algorithms Sanzheng Qiao Department of Computing and Software

Treewidth reduction and algorithmic applications Treewidth reduction and algorithmic applications

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 3: Parallel and Scalable Data

Zarr - scalable storage of tensor Zarr - scalable storage of tensor data for parallel and

Ruijsenaars-Schneider system from reduction Quasi- quasi-Hamiltonian reduction Hamiltonian

Architectural Support for Parallel Reduction in Scalable Shared Memory Multiprocessors in

and tacos! Who? Originally from Toronto, Canada Lives in Tallinn, Estonia Drupal since 2008

Temporal Common Sense n Humans assume information when reading Not explicitly mentioned

Linearizability &amp; CAP Announcements No hours this week. Announcements No hours this

An industrial case study of TACO Benjamin Lesage , Stephen Law, Iain Bate Icons courtesy of

MetaPost 1.207 (TEXLive 2009) EuroTEX 2009 SVG backend SVG backend SVG backend SVG backend A

On the approximate cohomology of quasi holomorphic line bundles Jean-Pierre Demailly Institut

Style Transfer from Non-Parallel Text by Cross-Alignment Shen et al 2017 Arxiv: 1705.09655

Fair CPU Time Accounting in CMP+SMT Processors Carlos Luque (UPC/BSC) Miquel Moreto

Linearizability & CAP Announcements No hours this week. Announcements No hours this