AdaptSize: Orchestrating the Hot Object Memory Cache in a CDN - - PowerPoint PPT Presentation

adaptsize orchestrating the hot object memory cache in a
SMART_READER_LITE
LIVE PREVIEW

AdaptSize: Orchestrating the Hot Object Memory Cache in a CDN - - PowerPoint PPT Presentation

AdaptSize: Orchestrating the Hot Object Memory Cache in a CDN Daniel S. Mor Ramesh K. Berger Harchol-Balter Sitaraman USENIX NSDI. Boston, March 28, 2017. CDN Caching Architecture Content providers 1% 1% 1% 1% DC HOC CDN 100% 100%


slide-1
SLIDE 1

AdaptSize: Orchestrating the Hot Object Memory Cache in a CDN

USENIX NSDI. Boston, March 28, 2017.

Daniel S. Berger Mor Harchol-Balter Ramesh K. Sitaraman

slide-2
SLIDE 2

CDN Caching Architecture

1

Content providers Users CDN

100% 100% 100% 100% 1% 1% 1% 1%

DC HOC

slide-3
SLIDE 3

HOC performance metric

  • bject hit ratio = OHR =

Optimizing CDN Caches

Two caching levels:

❏ Disk Cache (DC) ❏ Hot Object Cache (HOC)

2

# reqs served by HOC total # reqs

Goal: maximize OHR

100% 40%

DC HOC

1%

slide-4
SLIDE 4

Frequent decisions required What to admit What to evict

Prior Approaches to Cache Management

LRU mixtures of LRU/LFU concurrent LRU historically Today in practice

e.g., Nginx, Varnish

2000s in academia

e.g., Modha, Zhang, Kumar

2010s in academia

e.g., Kaminsky, Lim, Andersen

everything everything everything

500 GB per hour

3

DC HOC

a few GBs capacity

slide-5
SLIDE 5

We Are Missing a Key Issue

Not all objects are the same

4

9 orders of magnitude

❏ Should we admit every object?

(no, we should favor small objects)

❏ A few key companies know this

(but don’t know how to it well)

❏ Academia has not been helpful

(almost all theoretical work assumes equal-sized objects)

slide-6
SLIDE 6

What’s Hard About Size-Aware Admission

Fixed Size Threshold:

admit if size < Threshold c

5

The best threshold changes with traffic mix

How to pick c:

pick c to maximize OHR Threshold c

2pm 9pm b e s t c a t 8 a m

slide-7
SLIDE 7

Probabilistic admission:

Can we avoid picking a threshold c

6

Which curve makes big difference

Unfortunately, many curves

example: exp(c) family

We need to adapt c

high admission probability low admission probability

slide-8
SLIDE 8

What to admit What to evict

The AdaptSize Caching System

concurrent LRU AdaptSize

adaptive size-aware

First system that continuously adapts the parameter of size-aware admission Incorporated into high-throughput production caching system (Varnish)

7

adapt with traffic adapt with time

Take traffic measurements Calculate the best c Enforce admission control

Calculate the best c

slide-9
SLIDE 9

How to Find Best c Within Each Δ Interval

8

Local optima on

OHR-vs-c curve Traditional approach Hill climbing

…time

Enables speedy global optimization

AdaptSize approach Markov model

Δ interval Δ interval Δ interval

slide-10
SLIDE 10

How AdaptSize Gets the OHR-vs-c curve

Markov chain

9

Why hasn’t this been done?

Too slow: exponential state space

➢ track IN/OUT for each object

IN OUT request request hit

Algorithm

For every Δ interval and for every value of c

use Markov chain to solve for OHR(c)

find c to maximize OHR

miss

New technique: approximation with linear state space

slide-11
SLIDE 11

DC HOC

Implementing AdaptSize

10

Incorporated into Varnish

highly concurrent HOC system, 40+ Gbit/s

Take traffic measurements Calculate the best c Enforce admission control

Adapt Size

Goal: low overhead

  • n request path
slide-12
SLIDE 12

DC HOC

Implementing AdaptSize

11

Take traffic measurements

Calculate the best c Enforce admission control

Adapt Size

1) Concurrent write conflicts 2) Locks too slow

[NSDI’13 & 14]

producer/consumer + ring buffer

Challenges AdaptSize:

40% 1% requests objects

Incorporated into Varnish

highly concurrent HOC system, 40+ Gbit/s

Lock-free implementation

slide-13
SLIDE 13

DC HOC

Implementing AdaptSize

12

Take traffic measurements Calculate the best c

Enforce admission control

Adapt Size

Incorporated into Varnish

highly concurrent HOC system, 40+ Gbit/s

admission is really simple

AdaptSize: Enables lock free & low overhead implementation

given c, and the object size

admit with P(c, size)

slide-14
SLIDE 14

40 GBit / 100ms RTT

AdaptSize Evaluation Testbed

13

Clients: replay Akamai requests trace

440 million / 152 TB total requests

Origin: emulates 100s of web servers

55 million / 8.9 TB unique objects

HOC systems:

1.2 GB

16 threads

unmodified Varnish

NGINX cache

AdaptSize

40 GBit / 30ms RTT

DC HOC

Adapt Size

Origin

DC: unmodified Varnish

4x 1TB/ 7200 Rpm

slide-15
SLIDE 15

Comparison to Production Systems

14

+92%

what to admit what to evict Varnish Nginx AdaptSize frequency filter LRU adaptive size-aware concurrent LRU everything concurrent LRU

+48%

slide-16
SLIDE 16

Comparison to Research-Based Systems

15

manually tuned parameters manually tuned parameters manually tuned parameters

+67%

recency and frequency combinations

slide-17
SLIDE 17

Robustness of AdaptSize

Size-Aware OPT: offline parameter tuning

16

AdaptSize: our Markovian tuning model HillClimb: local-search using shadow queues

slide-18
SLIDE 18

Approach: size-based admission control

Conclusion

Goal: maximize OHR of the Hot Object Cache

17

OHR=

# reqs served by HOC total # reqs

slide-19
SLIDE 19

AdaptSize: adapts c via a Markov chain Approach: size-based admission control

Conclusion

Goal: maximize OHR of the Hot Object Cache

18

OHR=

# reqs served by HOC total # reqs

Key insight: need to adapt parameter c Result: 48-92% higher OHRs

slide-20
SLIDE 20

Key insight: need to adapt parameter c Approach: size-based admission control

Conclusion

Goal: maximize OHR of the Hot Object Cache

19

OHR=

# reqs served by HOC total # reqs

Result: 48-92% higher OHRs In our paper

Throughput

Disk utilization

Byte hit ratio

Request latency /dasebe/AdaptSize

AdaptSize: adapts c via a Markov chain