Fighting with Unknowns: Estimating the Performance of Scalable - - PowerPoint PPT Presentation

fighting with unknowns estimating the performance of
SMART_READER_LITE
LIVE PREVIEW

Fighting with Unknowns: Estimating the Performance of Scalable - - PowerPoint PPT Presentation

Fighting with Unknowns: Estimating the Performance of Scalable Distributed Storage Systems with Minimal Measurement Data Moo-Ryong Ra and Hee Won Lee 1 AT&T Labs Research May 23, 2019 1 Presenter at MSST 2019 Motivation Goal To


slide-1
SLIDE 1

Fighting with Unknowns: Estimating the Performance of Scalable Distributed Storage Systems with Minimal Measurement Data

Moo-Ryong Ra and Hee Won Lee1

AT&T Labs Research

May 23, 2019

1Presenter at MSST 2019

slide-2
SLIDE 2

Motivation

Goal

◮ To estimate the performance of scalable distributed storage

systems (e.g., Ceph and Swift) that use consistent hashing to distribute the workload as evenly as possible across all available compute resources

Problem

◮ Mathematical modeling or black-box approach needs a

significant amount of efforts and data collection processes

Our Approach

◮ We propose a simple, yet accurate performance estimation

technique for scalable distributed storage systems

◮ Our technique aims to identify max IOPS for an arbitrary

read/write ratio with a minimal evaluation process

slide-3
SLIDE 3

Our Model

Claim: If HW/SW/workload settings remain unchanged, the total processing capability (C) of a distributed storage system is invariant for a given IO size. C = Tread + Twrite · frw We can acquire frw value with just two data points: frw = T100%read T100%write

T100%write T100%read Tread (read IOPS) Twrite (write IOPS)

1 3

Our Model: (Linear)

2 2

slide-4
SLIDE 4

Our Model: arbitrary read/write ratio

Given that read/write ratio = Rread : Rwrite,

◮ read IOPS: Tread = k · Rread ◮ write IOPS: Twrite = k · Rwrite

k · Rread + k · Rwrite · frw = C k = T100%read Rread + {100 − Rread} · frw Once we get the value of k, it is trivial to obtain Tread and Twrite.

slide-5
SLIDE 5

Our Model: mixed IO sizes

Suppose that we have heterogeneous IO sizes, S1, S2, · · · , SN and know the proportion of each IO size to the total IOs, P1, P2, · · · , PN where N

i=0 Pi = 1.

¯ kS1 =

P1·T S1

100%read

Rread+{100−Rread}·f S1

rw = P1 · kS1

. . . ¯ kSN =

PN·T

SN 100%read

Rread+{100−Rread}·f

SN rw

= PN · kSN. Total IOPS can be obtained by: Ttotal =

N

  • i=1

{RSi

read + RSi write} · ¯

kSi = 100 ·

N

  • i=1

Pi · kSi

slide-6
SLIDE 6

Evaluation

We set up two different distributed storage systems: Ceph

◮ Block Storage, Strong Consistency, 3x Replication ◮ FIO: 104 OpenStack VMs, each running 8 FIO jobs

Converged Server

Host 1

Ceph-mon OSD

1.6TB NVMe

OSD

1.6TB NVMe

OSD

1.6TB NVMe

OSD

1.6TB NVMe

Host 4

OSD

1.6TB NVMe

OSD

1.6TB NVMe

OSD

1.6TB NVMe

OSD

1.6TB NVMe

Host 3

Ceph-mon OSD

1.6TB NVMe

OSD

1.6TB NVMe

OSD

1.6TB NVMe

OSD

1.6TB NVMe

Host 2

Ceph-mon OSD

1.6TB NVMe

OSD

1.6TB NVMe

OSD

1.6TB NVMe

OSD

1.6TB NVMe

Host 8

OSD

1.6TB NVMe

OSD

1.6TB NVMe

OSD

1.6TB NVMe

OSD

1.6TB NVMe

Host 9

OSD

1.6TB NVMe

OSD

1.6TB NVMe

OSD

1.6TB NVMe

OSD

1.6TB NVMe

vm-01 vm-09 vm-02 vm-10 vm-03 vm-11 vm-04 vm-12 vm-08 vm-16 Logging Monitoring Alerting vm-97 vm-98 vm-99 vm-100 vm-104

… … … … …

VMs Ceph Service Daemons

25 GbE Network Link

Swift

◮ Object Storage, Eventual Consistency, 3x Replication ◮ COSBench: 32 workers

10 GbE Network Link

Host 1

Object Server

480GB SSD 480GB SSD

Host 2

Object Server

480GB SSD 480GB SSD

Host 3

Object Server

480GB SSD 480GB SSD

Host 4

Object Server

480GB SSD 480GB SSD

Host 5

Proxy Server

Host 6

10 GbE Network Link Swift Service Daemons Swift Client Daemons Client

slide-7
SLIDE 7

Meaning of frw

[Our Model] Total processing cap.(C) is invariant per IO size: C = Tread + Twrite · frw where frw = T100%read

T100%write .

1 2 3 4 5 6 7 8 9 10 1000 2000 3000 4000 5000

f_rw IO Size

Ceph (block size) Swift (object size) 4KB 4MB 2MB 1MB 512KB 512KB 1MB 2MB 4MB 4KB

Note:

◮ frw reflects the load difference b/w read and write operations ◮ The amount of work required for read and write operations

can be very different per storage system implementation and their configurations

slide-8
SLIDE 8

Total Processing Capacity (C) per IO Size

[Our Model] Total processing cap.(C) is invariant per IO size: C = Tread + Twrite · frw where frw = T100%read

T100%write .

100000 200000 300000 400000 500000 600000 700000 800000 10 20 30 40 50 60 70 80 90 100

C Read Ratio (%)

4KB 8KB 16KB 32KB 64KB 128KB 256KB 512KB 1MB 2MB

(a) Ceph

200 400 600 800 1000 1200 1400 1600 1800 2000 10 20 30 40 50 60 70 80 90 100

C Read Ratio (%)

4KB 8KB 16KB 32KB 64KB 128KB 256KB 512KB 1MB 2MB

(b) Swift Figure: C value

slide-9
SLIDE 9

Performance Estimation

For obj size Si, when read/write ratio = Rread : Rwrite: kSi = T Si

100%read

Rread + {100 − Rread} · f Si

rw

For mixed obj sizes (Pi = proportion of obj size Si to total objs): Ttotal = 100 ·

N

  • i=1

Pi · kSi

200 400 600 800 1000 1200 1400 1600 10 30 50 70 90

T_total (IOPS) The ratio of 16KB objects (%)

Measured Estimated

(a) 16KB Read+1MB Read

200 400 600 800 1000 1200 10 30 50 70 90

T_total (IOPS) The ratio of 16KB objects (%)

Measured Estimated

(b) 16KB RW+512KB RW Figure: IO workloads with mixed object sizes on Swift cluster

slide-10
SLIDE 10

Performance Estimation Error

The errors between estimated and measured total IOPS are less than 9%.

20 40 60 80 100 10 30 50 70 90

Error (%) The ratio of the first objects (16KB/64KB/16KB) (%)

16KB Rs+1MB Rs 64KB Ws+1MB Ws 16KB R/Ws+512KB R/Ws

Figure: Estimation error on Swift Cluster

slide-11
SLIDE 11

Conclusion

  • 1. We proposed a novel technique to accurately estimate the

performance of an arbitrarily mixed workload, in terms of read/write ratio and IO size

  • 2. Our simple technique requires only a few data points – i.e.,

100% read IOPS and 100% write IOPS for each IO size

  • 3. Our technique can be applicable to any distributed storage

systems that distribute the load evenly across the available hardware resources

slide-12
SLIDE 12

Any Questions?

We are hiring a couple of systems researchers:

◮ Senior Inventive Scientist (for fresh PhDs) ◮ Principal Inventive Scientist (for mid-career professionals)

Contact: Hee Won Lee, PhD Email: knowpd@research.att.com Location: Bedminster, New Jersey