Cloud Storage from a Clients Perspective Binbing Hou, Feng Chen - PowerPoint PPT Presentation

Understanding I/O Performance Behaviors of Cloud Storage from a Client’s Perspective Binbing Hou, Feng Chen Louisiana State University Zhonghong Ou Beijing Univ. of Posts & Telecomm. Ren Wang, Michael Mesnier Intel Labs

Storage in the Cloud Era Enterprise Cloud Storage Personal Cloud Storage • Personal cloud storage subscriptions will reach 1.3 billion by 2017 1 • Public/private cloud storage market is predicted to be $65.41 billion by 2020 2 [1] https://technology.ihs.com/410084/subscriptions-tocloud-storage-services-to-reach-half-billion-level-this-year. [2] http://www.marketsandmarkets.com/Market-Reports/cloud-storage-market-902.html. 1

Cloud Storage vs. Conventional Storage Internet PCs SCSI Cloud Storage Cluster servers mobile devices • • • Cloud storage cluster Connection Clients - Highly diverse - Massively parallelized - World-wide internet - Different capabilities - High throughput - HTTP-based protocol Is our past wisdom on storage still applicable to cloud storage? 2

Measurement Methodology • Investigating cloud storage as storage services - “Black - box” testing - Adopting HTTP-based APIs rather than POSIX-like APIs - Purposely avoiding the client-side optimization techniques • Test Workloads - Request Type: PUT (upload), GET (download) - Parallelism degree: 1 – 64 - Request size: 1KB – 16MB - Metrics: Bandwidth and Latency • Platform to be tested: - Cloud: select Amazon S3 (the data center in Oregon) - Clients: customizing five Amazon EC2 instances varying capabilities 3

Outline • Basic Observations • Effect of Client’s Capability • Effect of Geo-distance • Case Study • System Implications 4

Basic Observations Effect of Parallelism - 10% + 18X • Effect of parallelism on bandwidth • Proper parallelization dramatically improves the bandwidth (e.g., 18x speedup). • Over-parallelization may cause performance degradation (e.g., 10% degradation). • Effect of parallelism on request latency • Proper parallelization does not significantly affect the latency. • Over-parallelization may lead to the latency increasing linearly. 5

Basic Observations Effect of Request Size + 29% + 770X + 55% + 210% Comparable • Effect of request size on bandwidth • Increasing request size can significantly increase the bandwidth (e.g., 770x speedup). • The benefit brought by increasing request size is not unlimited (i.e., diminishing improvement). • Effect of request size on request latency • Larger requests generally have higher request latencies. • The latency does not necessarily increase when request size increases (e.g., 1KB-16KB). 6

Basic Observations Parallelism vs. Request Size • Both are helpful to improve bandwidth but have limitations. • Does there exist any optimal combination? - e.g., Upload a 4MB object - Reasonable combinations: 4MBx1, 1MBx4, 256KBx16, 64KBx64 7

Basic Observations Parallelism vs. Request Size (cont.) • What if comparable bandwidth with different combinations? - e.g., Upload 16 objects of 1KB - Bandwidth: 16KB X 1 = 4KB X 4 = 1KB X 16 65% 15% 5% In such cases, larger requests consume less CPU resources. 8

Client’s Capability Effect of Client’s Capability • Experimental comparisons • Experimental comparisons - Comparing the performance of Baseline client with other four clients separately - Comparing the performance of Baseline client with other four clients separately - Client CPU: Baseline (2 CPUs) vs. CPU-plus (4 CPUs) - Client CPU: Baseline (2 CPUs) vs. CPU-plus (4 CPUs) - Client memory: Baseline(7.5GB) vs. MEM-minus(3.5GB) - Client memory: Baseline(7.5GB) vs. MEM-minus(3.5GB) - Client storage: Baseline (Magnetic) vs. STOR-ssd (SSD) - Client storage: Baseline (Magnetic) vs. STOR-ssd (SSD) Client Instance Location vCPU Memory Storage Baseline m1.large Oregon 2 7.5 GB Magnetic (410 GB) CPU-plus c3.xlarge Oregon 4 7.5 GB Magnetic (410 GB) MEM-minus m1.large Oregon 2 3.5 GB Magnetic (410 GB) STOR-ssd m1.large Oregon 2 7.5 GB SSD (410 GB) GEO-Sydney m1.large Sydney 2 7.5 GB Magnetic (410 GB) 9

Client’s Capability Effect of Client CPU • Client CPU - CPU is responsible for both data packets sending/receiving and client I/O. - Comparison: Baseline (2 CPUs) vs. CPU-plus (4 CPUs) Comparable Large gap • Client CPU has significant effects on small requests. • Client CPU does not have significant effects on large requests. 10

Geo-distance Effect of Geo-distance • Geo-distance - Comparison: Baseline (in Oregon) vs. GEO-Sydney (in Sydney) - RTT: 0.28ms (same data center in Oregon) vs. 176ms (from Sydney to Oregon) • Effect of geo-distance on bandwidth - Geo-distance has significant effect on peak bandwidth of small requests - Geo-distance does not significantly affect peak bandwidth of large requests Small gap Large gap 11

Geo-distance Effect of Geo-distance (cont.) • Geo-distance - Comparison: Baseline (in Oregon) vs. GEO-Sydney (in Sydney) - RTT: 0.28ms (same data center in Oregon) vs. 176ms (from Sydney to Oregon) • Effect of geo-distance on latency - RTT plays a critical role but is not the only determining factor Large gap Flatter curve Converge Large gap 12

Case Study Case Study: Client-side Caching • Client-side caching and chunking Client Write Read - Chunking is a key technique used in cloud storage - Chunking will affect the caching efficiency Disk Cache - Small chunk size may lead to high cache miss ratio GET PUT - Large chunk size may be risky of loading unwanted data • Experimental platform - Cloud storage services: Amazon S3 in Oregon Amazon S3 - Client: a workstation in Louisiana - Emulator: converting POSIX operations to HTTP requests; disk cache support • Trace - Converted from a segment of NFS trace of Harvard SOS project 3 - Workload size: 4.8 GB - Average file size: 12.9 MB [3] https://www.eecs.harvard.edu/sos/traces.html 13

Case Study Case Study: Proper Chunk Size for Caching • How can we determine a proper chunk size? - Bandwidth-based method can give some hints - Select a relatively small size that can approximately reach the peak bandwidth When chunk size exceeds 4MB: 4MB • cannot bring significant benefit • high risk of loading unwanted data Proper chunk size ≈ 4MB ? 14

Case Study Case Study: Proper Chunk Size for Caching(cont.) • Experimental comparison - Standard LRU, 200 MB disk cache, write back every 30s - Comparison: 64KB, 1MB, 4MB , 8MB, 16MB 109.8ms 95.2ms 88.6ms 66.8ms 60.2ms 51.4ms • 4MB leads to the lowest read/write latencies. 15

Case Study Case Study: Proper Chunk Size for Caching(cont.) Write hit ratio 4MB: 99.4% Read hit ratio 4MB: 98.4% Read hit ratio 64KB: 77.8% Write hit ratio 64KB: 88.9% • How does chunking policy affect caching efficiency? - Increasing request size significantly improves the performance (i.e., hit ratio). - Excessively large request size causes performance degradation. - high cache miss penalty=high download latency (4s to 4MB, 14.2s to 16MB). 16

System Implications • Properly combining parallelism and request size - Reshaping the workloads: chunking/bundling, parallelizing - Optimal bandwidth can be achieved by proper combination • Client-aware optimization - Special attentions should be paid to exploiting client’s capability - Small requests (CPU) vs. large requests(Memory, Storage) - Client aware optimization: e.g., smartphone (weak CPU) • Geographical distance plays an important role - The negative effects can be offset by proper optimization - Latency-sensitive applications: e.g., file system, database - Bandwidth-sensitive applications: e.g., backup, video services 17

Conclusion • We present a comprehensive measurement of cloud storage from the perspective of the client side. • Our case studies demonstrate that user experiences can be better optimized by understanding cloud I/O behaviors. • Based on our findings, we present a series of system implications. Thank you! {bhou, fchen}@csc.lsu.edu zhonghong.ou@bupt.edu.cn {ren.wang, michael.mesnier}@intel.com 18

Cloud Storage from a Clients Perspective Binbing Hou, Feng Chen - PowerPoint PPT Presentation

Understanding I/O Performance Behaviors of Cloud Storage from a Clients Perspective Binbing Hou, Feng Chen Louisiana State University Zhonghong Ou Beijing Univ. of Posts & Telecomm. Ren Wang, Michael Mesnier Intel Labs Storage in the

Multi-Threaded Servers December 6, 2007 1 Client-Server Communication Client Client Client

PgBouncer and a Bit of Queueing Theory Peter Eisentraut peter.eisentraut@2ndquadrant.com

Cloud Cloud Cloud Cloud network Edge Edge Edge Edge as a Edge Edge Edge Edge Edge

Caches & Memcache Example Client N. America Client System Asia + Caches Client Africa

Large objects in the Cloud Thursday, 11 April 13 Riak Cloud Storage Cloud Storage software

Cloud Storage Nabil Abdennadher nabil.abdennadher@hesge.ch 1 Cloud storage Objective

A Simulation-based Evaluation of a Hybrid Storage System combining P2P, F2F, and Cloud storage

Primary/Backup CS 452 Single-node key/value store Client Put key1 value1 Client

Primary/Backup CS 452 Single-node key/value store Client Put key1 value1 Client

Perspective LanguaL Structured Vocabulary: USDA Perspective Joanne Holden Perspective: Earth

Building a Private Cloud Cloud Infrastructure Using Opensource Building a Private Cloud OSCON

KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP CLOUD MONITORING AWS HTTP APP

Cloud Computing and Cloud Storage By: Maurice Kelly History of Internet and Cloud Computing

Storage Deduplication in Cloud Computing Joo Paulo and Jos Pereira University of Minho July

Kurma: Secure Geo-distributed Multi-cloud Storage Gateways Ming Chen and Erez Zadok Stony Brook

Cloud object storage in Ceph Orit Wasserman owasserm@redhat.com Fosdem 2017 AGENDA What is

1 Inte r nal Contr ol Compone nts F ive Co mpo ne nts o f I nte rna l Co ntro l Co

Chapter 3: Processes: Outline Process Concept: views of a process Process Scheduling CSCI

COBRA Overview and QSEHRA Assistance Consolidated Omnibus Budget Reconciliation Act (COBRA)

Using Loss Data to Win Over Clients Webinar: June 29 th @ 11am EDT How can you use loss data to

How a Technology Client became 1st in North America to be TMMi Level 3 [Presented by] Suresh

Updating an Embedded System About me Me: Software Engineer at DENX, Gmbh U-Boot

MC714 - Sistemas Distribuidos slides by Maarten van Steen (adapted from Distributed System - 3rd

Squall: Fine-Grained Live Reconfiguration for Partitioned Main Memory Databases AARON J. ELMORE,

Cloud Storage from a Clients Perspective Binbing Hou, Feng Chen - PowerPoint PPT Presentation

Understanding I/O Performance Behaviors of Cloud Storage from a Clients Perspective Binbing Hou, Feng Chen Louisiana State University Zhonghong Ou Beijing Univ. of Posts & Telecomm. Ren Wang, Michael Mesnier Intel Labs Storage in the

Multi-Threaded Servers December 6, 2007 1 Client-Server Communication Client Client Client

PgBouncer and a Bit of Queueing Theory Peter Eisentraut peter.eisentraut@2ndquadrant.com

Cloud Cloud Cloud Cloud network Edge Edge Edge Edge as a Edge Edge Edge Edge Edge

Caches &amp; Memcache Example Client N. America Client System Asia + Caches Client Africa

Large objects in the Cloud Thursday, 11 April 13 Riak Cloud Storage Cloud Storage software

Cloud Storage Nabil Abdennadher nabil.abdennadher@hesge.ch 1 Cloud storage Objective

A Simulation-based Evaluation of a Hybrid Storage System combining P2P, F2F, and Cloud storage

Primary/Backup CS 452 Single-node key/value store Client Put key1 value1 Client

Primary/Backup CS 452 Single-node key/value store Client Put key1 value1 Client

Perspective LanguaL Structured Vocabulary: USDA Perspective Joanne Holden Perspective: Earth

Building a Private Cloud Cloud Infrastructure Using Opensource Building a Private Cloud OSCON

KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP CLOUD MONITORING AWS HTTP APP

Cloud Computing and Cloud Storage By: Maurice Kelly History of Internet and Cloud Computing

Storage Deduplication in Cloud Computing Joo Paulo and Jos Pereira University of Minho July

Kurma: Secure Geo-distributed Multi-cloud Storage Gateways Ming Chen and Erez Zadok Stony Brook

Cloud object storage in Ceph Orit Wasserman owasserm@redhat.com Fosdem 2017 AGENDA What is

1 Inte r nal Contr ol Compone nts F ive Co mpo ne nts o f I nte rna l Co ntro l Co

Chapter 3: Processes: Outline Process Concept: views of a process Process Scheduling CSCI

COBRA Overview and QSEHRA Assistance Consolidated Omnibus Budget Reconciliation Act (COBRA)

Using Loss Data to Win Over Clients Webinar: June 29 th @ 11am EDT How can you use loss data to

How a Technology Client became 1st in North America to be TMMi Level 3 [Presented by] Suresh

Updating an Embedded System About me Me: Software Engineer at DENX, Gmbh U-Boot

MC714 - Sistemas Distribuidos slides by Maarten van Steen (adapted from Distributed System - 3rd

Squall: Fine-Grained Live Reconfiguration for Partitioned Main Memory Databases AARON J. ELMORE,

Caches & Memcache Example Client N. America Client System Asia + Caches Client Africa