Redefining Data Locality for Cross-Data Center Storage Kwangsung - - PowerPoint PPT Presentation

redefining data locality for cross data center storage
SMART_READER_LITE
LIVE PREVIEW

Redefining Data Locality for Cross-Data Center Storage Kwangsung - - PowerPoint PPT Presentation

Redefining Data Locality for Cross-Data Center Storage Kwangsung Oh, Ajaykrishna Raghavan Abhishek Chandra, and Jon Weissman Department of Computer Science and Engineering University of Minnesota Twin Cities Background Private Cloud


slide-1
SLIDE 1

Redefining Data Locality for Cross-Data Center Storage

Kwangsung Oh, Ajaykrishna Raghavan Abhishek Chandra, and Jon Weissman Department of Computer Science and Engineering University of Minnesota Twin Cities

slide-2
SLIDE 2

Private Cloud

Background

slide-3
SLIDE 3

Network Storage Computation

Background

slide-4
SLIDE 4

ElastiCache

EBS

S3

App / Server

Background

slide-5
SLIDE 5

Data replication is unavoidable

slide-6
SLIDE 6

Questions

  • Where to store data?
  • Which Datacenter, Local or Near or Remote DCs?
  • Which Storage tier, Faster or Slower Tiers?
  • When and where to replicate or move data?
  • Which data?
  • No single answer.
  • Answer should be changed based on user requirements such as QoS

(Performance), consistency, expected workload, and cost.

slide-7
SLIDE 7

App / Server

Disk-locality in datacenter computing considered irrelevant

slide-8
SLIDE 8

Motivation

From http://www.datacentermap.com

slide-9
SLIDE 9

Azure Storage

EBS Disk Memcache

ElastiCache

S3

  • Multiple DCs are in the same region

and close each other.

  • By using nearby DC, data locality can

be extended.

  • Data can be stored in non-local DC’s storage

without (or less) data locality concern.

Key observations

App / Server

slide-10
SLIDE 10

DC locations example

slide-11
SLIDE 11

Latency and bandwidth between DCs

Latency (ms) between DCs Region US West US East Europe West Asia Southeast

AWS Azure AWS Azure AWS Azure GC AWS Azure AWS

  • 3.84
  • 1.97
  • 17.58

16.33

  • 1.84

Azure 3.62

  • 1.99
  • 18.67
  • 16.02

1.98

  • GC
  • 16.35

16.12

  • Bandwidth (MB/s) between DCs

Region US West US East Europe West Asia Southeast

AWS Azure AWS Azure AWS Azure GC AWS Azure AWS

  • 48.75
  • 48.13
  • 48.38

48.63

  • 48.88

Azure 21.62

  • 23.63
  • 45.25
  • 53.5

24.38

  • GC
  • 32.38

40.25

slide-12
SLIDE 12

Data Retrieval Time (100KB)

slide-13
SLIDE 13

Data Retrieval time (100KB) in US East

slide-14
SLIDE 14

Disk performance of AWS and Azure

slide-15
SLIDE 15

Various Data Size

slide-16
SLIDE 16

Summary of experiments

  • Accessing data in memory, in a nearby DC is faster than local

slower storage tier.

  • Accessing data from disk (archival) storage in a nearby DC

can be as fast as accessing disk (archival) in the local DC.

  • These trends hold for data sizes up to 1MB (can be increased),

which encompass many common internet applications.

slide-17
SLIDE 17

Usecases

  • Simpler Consistency Policy
  • Using faster (memory) tier can reduce the number of replicas.
  • Lowering number of replicas reduces the network traffic for consistency.
  • Hot and Cold Data
  • Data can be located in Memory on DC A and Disk or Archival in DC B based on

data access.

  • Higher Availability
  • If DC A fails, the application can minimize the performance penalty by using DC

B’s faster storage.

slide-18
SLIDE 18

Usecases

  • Expanding the Memory Tier
  • New VM instance needs to be spawned but it can be expensive.
  • Spawning VM instance can be rejected by providers’ policy but not outage.
slide-19
SLIDE 19

Usecases

  • Competitive Pricing
  • Each cloud provider has different pricing policy for their service.

The Cheapest VM Instance for 3.5GB Memory from each cloud provider in US East AWS T2.medium (4GB, 2 cores) $0.052 / hour – $37.44 / month ($9.36 / GB) Azure A2-Basic Tier (3.5GB, 2 cores) $0.088 / hour – $63.36 / month ($18.10 / GB) Google Cloud n1-standard-1 (3.5GB, 1 core) $0.049 / hour – $35.29 / month ($10.08 / GB) The Cheapest VM Instance for 25GB Memory from each cloud provider in US East AWS r3.xlarge (30.5GB, 4 cores) $0.350 / hour – $252 / month ($8.26 / GB) Azure D12 (28GB, 4 cores) $0.476 / hour – $342.72 / month ($12.24 / GB) Google Cloud n1-highmem-8 (26GB, 4 core) $0.226 / hour – $162.72 / month ($6.25 / GB)

slide-20
SLIDE 20

Web application case study (RUBiS)

  • eBay like web application (Apache + MySQL)
  • 1,000,000 users, 1,000,000 items -> 2GB
  • Emulate 300 users (view, sell, bid, buy, comment ...)
  • Change the location where MySQL is running
  • Local node’s disk
  • with system buffer cache.
  • limited size of buffer cache.
  • Nearby DC node’s disk
  • Ramdisk (with system buffer cache)
slide-21
SLIDE 21

Benchmark (RUBiS)

slide-22
SLIDE 22

Challenges

  • Infrastructure Dynamics
  • Cloud services do not provide consistent performance over time.
  • Performance throttling based on the VM instance size.
slide-23
SLIDE 23

Challenges

  • Application Dynamics
  • Data size and access patterns keep changing.
  • Simple Storage Abstraction
  • More complexities from different storage interfaces and various pricing policy
  • Discovering nearby DCs
  • Network performance between DCs is not decided by physical distance.
  • Cloud providers’ implementation and policies
  • Same storage tier has different performance.
  • New types of VM Instance and new pricing policy.
  • Network cost should be considered for optimized cost.
slide-24
SLIDE 24

Conclusion

  • Data locality can be extended with denser data centers
  • Accessing data in nearby DC can be faster than local storage tiers.
  • Small size data can be stored nearby DC without (less) locality concern.
  • Benefits from using multiple data centers
  • Better performance.
  • Reduced cost.
  • Better availability.
  • Durability.
  • Many challenges to be overcome for realizing such benefits
slide-25
SLIDE 25

Thank you!

  • Questions?

25