Azor: Using Two-level Block Selection to Improve SSD-based I/O - - PowerPoint PPT Presentation

azor using two level block selection to improve ssd based
SMART_READER_LITE
LIVE PREVIEW

Azor: Using Two-level Block Selection to Improve SSD-based I/O - - PowerPoint PPT Presentation

Introduction System Design Experimental Platform Evaluation Conclusions Azor: Using Two-level Block Selection to Improve SSD-based I/O caches Yannis Klonatos, Thanos Makatos, Manolis Marazakis, Michail D. Flouris, Angelos Bilas { klonatos,


slide-1
SLIDE 1

Introduction System Design Experimental Platform Evaluation Conclusions

Azor: Using Two-level Block Selection to Improve SSD-based I/O caches

Yannis Klonatos, Thanos Makatos, Manolis Marazakis, Michail D. Flouris, Angelos Bilas

{klonatos, makatos, maraz, flouris, bilas}@ics.forth.gr

Foundation for Research and Technology - Hellas (FORTH), Institute of Computer Science (ICS)

July 28, 2011

Yannis Klonatos et al. FORTH-ICS, Greece 1 / 29

slide-2
SLIDE 2

Introduction System Design Experimental Platform Evaluation Conclusions Background Previous Work Our goal

Table of contents

1

Introduction

2

System Design

3

Experimental Platform

4

Evaluation

5

Conclusions

Yannis Klonatos et al. FORTH-ICS, Greece 2 / 29

slide-3
SLIDE 3

Introduction System Design Experimental Platform Evaluation Conclusions Background Previous Work Our goal

Background

Increased need for high-performance storage I/O

  • 1. Larger file-set sizes ⇒ more I/O time
  • 2. Server virtualization and consolidation ⇒ more I/O pressure

SSDs can mitigate I/O penalties SSD HDD Throughput (R/W) (MB/s) 277/202 100/90 Response time (ms) 0.17 12.6 IOPS (R/W) 30,000/3,500 150/150 Price/capacity ($/GB) $3 $0.3 Capacity per device 32 – 120 GB Up to 3TB Mixed SSD and HDD environments are necessary Cost-effectiveness: deploy SSDs as HDDs caches

Yannis Klonatos et al. FORTH-ICS, Greece 3 / 29

slide-4
SLIDE 4

Introduction System Design Experimental Platform Evaluation Conclusions Background Previous Work Our goal

Previous Work

Web servers as a secondary file cache [Kgil et al., 2006]

⊲ Requires application knowledge and intervention

Readyboost feature in Windows

⊲ Static file preloading ⊲ Requires user interaction

bcache module in the Linux Kernel

⊲ Has no admission control

NetApp’s Performance Acceleration Module

⊲ Needs specialized hardware

Yannis Klonatos et al. FORTH-ICS, Greece 4 / 29

slide-5
SLIDE 5

Introduction System Design Experimental Platform Evaluation Conclusions Background Previous Work Our goal

Our goal

Design Azor, a transparent SSD cache

⊲ Move SSD caching to block-level ⊲ Hide the address space of SSDs

Thorough analysis of design parameters

  • 1. Dynamic differentiation of blocks
  • 2. Cache associativity
  • 3. I/O concurrency

Yannis Klonatos et al. FORTH-ICS, Greece 5 / 29

slide-6
SLIDE 6

Introduction System Design Experimental Platform Evaluation Conclusions Overall design space Dynamic block differentiation Cache Associativity I/O Concurrency

Table of contents

1

Introduction

2

System Design

3

Experimental Platform

4

Evaluation

5

Conclusions

Yannis Klonatos et al. FORTH-ICS, Greece 6 / 29

slide-7
SLIDE 7

Introduction System Design Experimental Platform Evaluation Conclusions Overall design space Dynamic block differentiation Cache Associativity I/O Concurrency

Overall design space

Yannis Klonatos et al. FORTH-ICS, Greece 7 / 29

slide-8
SLIDE 8

Introduction System Design Experimental Platform Evaluation Conclusions Overall design space Dynamic block differentiation Cache Associativity I/O Concurrency

Writeback Cache Design Issues

  • 1. Requires synchronous metadata updates for write I/Os,

HDDs may not have the up-to-date blocks Must know the location of each block in case of failure

  • 2. Reduces system resilience to failures,

A failing SSD results in data loss SSDs are hidden, so other layers can’t handle these failures

⊲ Our write-through design avoids these issues

Yannis Klonatos et al. FORTH-ICS, Greece 8 / 29

slide-9
SLIDE 9

Introduction System Design Experimental Platform Evaluation Conclusions Overall design space Dynamic block differentiation Cache Associativity I/O Concurrency

Overall design space

Yannis Klonatos et al. FORTH-ICS, Greece 9 / 29

slide-10
SLIDE 10

Introduction System Design Experimental Platform Evaluation Conclusions Overall design space Dynamic block differentiation Cache Associativity I/O Concurrency

Overall design space

Yannis Klonatos et al. FORTH-ICS, Greece 10 / 29

slide-11
SLIDE 11

Introduction System Design Experimental Platform Evaluation Conclusions Overall design space Dynamic block differentiation Cache Associativity I/O Concurrency

Overall design space

Yannis Klonatos et al. FORTH-ICS, Greece 11 / 29

slide-12
SLIDE 12

Introduction System Design Experimental Platform Evaluation Conclusions Overall design space Dynamic block differentiation Cache Associativity I/O Concurrency

Dynamic block differentiation

Blocks are not equally important to performance

⊲ Makes sense to differentiate during admission to SSD cache

Introduce a 2-Level Block Selection scheme (2LBS) First level: Prioritize filesystem metadata over data

⊲ Many more small files → more FS metadata ⊲ Additional FS metadata introduced for data protection ⊲ Cannot rely on DRAM for effective metadata caching ⊲ Metadata requests represent 50% – 80% of total I/O accesses ⋆

Second level: Prioritize between data blocks

⊲ Some data are accessed more frequently ⊲ Some data are used for faster accesses to other data

⋆ D. Roselli and T. E. Anderson, ”A comparison of file system workloads”, Usenix ATC 2000

Yannis Klonatos et al. FORTH-ICS, Greece 12 / 29

slide-13
SLIDE 13

Introduction System Design Experimental Platform Evaluation Conclusions Overall design space Dynamic block differentiation Cache Associativity I/O Concurrency

Two-level Block Selection

Modify XFS filesystem to tag FS metadata requests

⊲ Transparent metadata detection also possible

Keep in DRAM an estimate of each HDD block’s accesses

⊲ Static allocation: 256 MB DRAM required per TB of HDDs ⊲ DRAM space required is amortized with better performance ⊲ Dynamic allocation of counters reduces DRAM footprint

Yannis Klonatos et al. FORTH-ICS, Greece 13 / 29

slide-14
SLIDE 14

Introduction System Design Experimental Platform Evaluation Conclusions Overall design space Dynamic block differentiation Cache Associativity I/O Concurrency

Cache Associativity

Associativity: performance and metadata footprint tradeoff Higher-way associativities need more DRAM space for metadata Direct-Mapped cache

⊲ Minimizes metadata requirements ⊲ Suffers from conflict misses

Fully-Set-Associative cache

⊲ 4.7× more metadata than the direct-mapped cache ⊲ Proper choice of replacement policy is important

Yannis Klonatos et al. FORTH-ICS, Greece 14 / 29

slide-15
SLIDE 15

Introduction System Design Experimental Platform Evaluation Conclusions Overall design space Dynamic block differentiation Cache Associativity I/O Concurrency

Cache Associativity - Replacement policy

Large variety of replacement algorithms used in CPUs/DRAM

⊲ Prohibitively expensive in terms of metadata size ⊲ Assume knowledge of the workload I/O patterns ⊲ May cause up to 40% performance variance

We choose the LRU replacement policy

⊲ Good reference point for more sophisticated policies ⊲ Reasonable choice since buffer-cache uses LRU

Yannis Klonatos et al. FORTH-ICS, Greece 15 / 29

slide-16
SLIDE 16

Introduction System Design Experimental Platform Evaluation Conclusions Overall design space Dynamic block differentiation Cache Associativity I/O Concurrency

I/O Concurrency

A high degree of I/O concurrency: ⊲ Allows overlapping I/O with computation ⊲ Effectively hides I/O latency

1 Allow concurrent read accesses on the same cache line

⊲ Track only pending I/O requests ⊲ Reader-writer locks per cache line are prohibitevely expensive

2 Hide SSD write I/Os of read misses

⊲ Copy the filled buffers to a new request ⊲ Introduces a memory copy ⊲ Must maintain state of pending I/Os

Yannis Klonatos et al. FORTH-ICS, Greece 16 / 29

slide-17
SLIDE 17

Introduction System Design Experimental Platform Evaluation Conclusions Experimental Setup Benchmarks Experimental Questions

Table of contents

1

Introduction

2

System Design

3

Experimental Platform

4

Evaluation

5

Conclusions

Yannis Klonatos et al. FORTH-ICS, Greece 17 / 29

slide-18
SLIDE 18

Introduction System Design Experimental Platform Evaluation Conclusions Experimental Setup Benchmarks Experimental Questions

Experimental Setup

Dual socket, quad core Intel Xeon 5400 (64-bit) Twelve 500GB SATA-II disks with write-through caching Areca 1680D-IX-12 SAS/SATA RAID storage controller Four 32GB Intel SLC SSDs (NAND Flash) HDDs and SSDs on RAID-0 setup, 64KB chunks Centos 5.5 OS, kernel version 2.6.18-194 XFS filesystem 64GB DRAM, varied by experiment

Yannis Klonatos et al. FORTH-ICS, Greece 18 / 29

slide-19
SLIDE 19

Introduction System Design Experimental Platform Evaluation Conclusions Experimental Setup Benchmarks Experimental Questions

Benchmarks

I/O intensive workloads, between hours to days for each run Type Properties File Set RAM SSD Cache sizes (GB) TPC-H Data warehouse Read only 28GB 4GB 7,14,28 SPECsfs CIFS File- server write-dominated, latency-sensitive Up to 2TB 32GB 128 TPC-C OLTP workload highly- concurrent 155GB 4GB 77.5

Yannis Klonatos et al. FORTH-ICS, Greece 19 / 29

slide-20
SLIDE 20

Introduction System Design Experimental Platform Evaluation Conclusions Experimental Setup Benchmarks Experimental Questions

Experimental Questions

Which is the best static decision for handling I/O misses? Does dynamically differentiating blocks improve performance? How does cache associativity impact performance? Can our design options cope with a ”black box” workload?

Yannis Klonatos et al. FORTH-ICS, Greece 20 / 29

slide-21
SLIDE 21

Introduction System Design Experimental Platform Evaluation Conclusions Static decision for handling I/O misses Dynamic differentiation of blocks Importance of cache associativity A black box workload

Table of contents

1

Introduction

2

System Design

3

Experimental Platform

4

Evaluation

5

Conclusions

Yannis Klonatos et al. FORTH-ICS, Greece 21 / 29

slide-22
SLIDE 22

Introduction System Design Experimental Platform Evaluation Conclusions Static decision for handling I/O misses Dynamic differentiation of blocks Importance of cache associativity A black box workload

Static decision for I/O misses (SPECsfs2008)

Cache Write Policy 2000 4000 6000 8000 10000 12000 14000 16000 CIFS ops/sec

Native-12HDDs miss-allocate

Cache Write Policy 2000 4000 6000 8000 10000 12000 14000 16000 CIFS ops/sec

miss-noallocate 2000 4000 6000 8000 10000 12000 14000 CIFS ops/sec 5 10 15 20 Latency (msec/op) Native miss-noallocate miss-allocate

11% to 66% better performance than HDDs Huge file set, only 30% accessed

⊲ write-hdd-ssd policy evicts useful blocks

Up to 5000 CIFS ops/sec difference for the same latency

Yannis Klonatos et al. FORTH-ICS, Greece 22 / 29

slide-23
SLIDE 23

Introduction System Design Experimental Platform Evaluation Conclusions Static decision for handling I/O misses Dynamic differentiation of blocks Importance of cache associativity A black box workload

Differentiating filesystem metadata (SPECsfs2008)

FS metadata continuously increase during execution

4 8 10 12 16 32 RAM (GBytes) 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 CIFS ops/sec Maximum Sustained Load 50000 100000 150000 200000 250000 300000 350000 400000 450000 500000 550000 600000 # Metadata Requests # Metadata Requests 1000 2000 3000 4000 5000 6000 7000 CIFS ops/sec 5 10 15 20 25 Latency (msec/op) Native-12HDDs Base Fully-Set-Associative 2LBS Fully-Set-Associative

Metadata DRAM misses ⇒ up to 71% impact DRAM data hit ratio less than 5% 3,000 more CIFS ops/sec between HDDs and Azor ∼23% latency reduction when using 2LBS in Azor

Yannis Klonatos et al. FORTH-ICS, Greece 23 / 29

slide-24
SLIDE 24

Introduction System Design Experimental Platform Evaluation Conclusions Static decision for handling I/O misses Dynamic differentiation of blocks Importance of cache associativity A black box workload

Differentiating filesystem data blocks (TPC-H)

Filesystem data like indices important for databases Data differentiation improves performance

14 14 28 14 14 28

Cache size (GB)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Speedup to native Base 14GB 2LBS 14GB DM FA

14 14 28 14 14 28

Cache size (GB)

0.0 12.5 25.0 37.5 50.0 62.5 75.0 87.5 100.0

Hit Ratio (%) Base 28GB DM FA

1.95× and 1.53× improvement for DM and FA caches Medium size DM is 20% better than large size DM

→ With 10% less hit ratio

Yannis Klonatos et al. FORTH-ICS, Greece 24 / 29

slide-25
SLIDE 25

Introduction System Design Experimental Platform Evaluation Conclusions Static decision for handling I/O misses Dynamic differentiation of blocks Importance of cache associativity A black box workload

Importance of cache associativity (TPC-H)

7 14 28 7 14 28

Cache size (GB)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Speedup to HDDs 7GB cache 14GB cache DM FA

7 14 28 7 14 28

Cache size (GB)

0.0 12.5 25.0 37.5 50.0 62.5 75.0 87.5 100.0

Hit Ratio (%) 28GB cache DM FA

FA better than DM for all cache sizes

⊲ Large size FA = 1.36× better than DM counterpart ⊲ Up to 15% less conflict misses than DM ⊲ Medium size FA 32% better than large size DM

Yannis Klonatos et al. FORTH-ICS, Greece 25 / 29

slide-26
SLIDE 26

Introduction System Design Experimental Platform Evaluation Conclusions Static decision for handling I/O misses Dynamic differentiation of blocks Importance of cache associativity A black box workload

A black box workload (TPC-C)

We choose the best parameters found so far

⊲ Fully-set-associative cache design ⊲ SSD cache size of half the workload size

Cache Policy

250 500 750 1000 1250 1500

NOTPM

Native 12HDDs Base 2LBS

Cache Policy

0.0 12.5 25.0 37.5 50.0 62.5 75.0 87.5 100.0

Hit Ratio(%)

Base cache: 55% improvement to native 2LBS cache: 34% additional improvement Hit ratio remains the same in both versions Disk utilization is 100%, SSD utilization under 7%

Yannis Klonatos et al. FORTH-ICS, Greece 26 / 29

slide-27
SLIDE 27

Introduction System Design Experimental Platform Evaluation Conclusions

Table of contents

1

Introduction

2

System Design

3

Experimental Platform

4

Evaluation

5

Conclusions

Yannis Klonatos et al. FORTH-ICS, Greece 27 / 29

slide-28
SLIDE 28

Introduction System Design Experimental Platform Evaluation Conclusions

Conclusions

We use SSD-based I/O caches to increase storage performance Performance is improved with higher way associativities

⊲ At the cost of 4.7× higher metadata footprint

We explore differentiation of HDD blocks

⊲ According to their expected importance on system performance ⊲ Design and evaluation of a two-level block selection scheme

Overall, our work shows that differentiation of blocks is a promising technique for improving SSD-based I/O caches

⊲ Reduces latency and improves throughput

Yannis Klonatos et al. FORTH-ICS, Greece 28 / 29

slide-29
SLIDE 29

Introduction System Design Experimental Platform Evaluation Conclusions

Thank You!

Meet the real Azor! ¨ ⌣

Yannis Klonatos et al. FORTH-ICS, Greece 29 / 29