Analysis and design of list-based cache replacement policies 1 - - PowerPoint PPT Presentation

analysis and design of list based cache replacement
SMART_READER_LITE
LIVE PREVIEW

Analysis and design of list-based cache replacement policies 1 - - PowerPoint PPT Presentation

Analysis and design of list-based cache replacement policies 1 Nicolas Gast (Inria) Inria (joint work with Benny Van Houdt (Univ. of Antwerp)) POLARIS / DataMove Seminar, Jan.2016, Inria 1 Mainly based on Transient and Steady-state Regime of a


slide-1
SLIDE 1

Analysis and design of list-based cache replacement policies1

Nicolas Gast (Inria)

Inria (joint work with Benny Van Houdt (Univ. of Antwerp))

POLARIS / DataMove Seminar, Jan.2016, Inria

1Mainly based on Transient and Steady-state Regime of a Family of List-based Cache

Replacement Algorithms, by G and Van Houdt. ACM SIGMETRICS 2015.

Nicolas Gast – 1 / 31

slide-2
SLIDE 2

Caches are everywhere

User/Application data source slow cache fast Examples: Processor Database CDN Single cache / hierarchy of caches

Nicolas Gast – 2 / 31

slide-3
SLIDE 3

In this talk, I focus on a single cache.

The question is: which item to replace? Application data source cache

requests

Nicolas Gast – 3 / 31

slide-4
SLIDE 4

In this talk, I focus on a single cache.

The question is: which item to replace? Application data source cache

requests

hit

Nicolas Gast – 3 / 31

slide-5
SLIDE 5

In this talk, I focus on a single cache.

The question is: which item to replace? Application data source cache

requests

hit replace one item miss Classical cache replacement policies: RAND, FIFO LRU CLIMB Other approaches: Time to live

Nicolas Gast – 3 / 31

slide-6
SLIDE 6

The analysis of cache performance has a growing interest

Theoretical studies: started with [King 1971, Gelenbe 1973] Nowadays: New applications: CDN / CON (replication2) New analysis techniques (Che approximation3,4)

2[Borst et al. 2010] Distributed Caching Algorithms for Content Distribution Networks 3[Che et al 2002] Hierarchical web caching sys tems: modeling, design and experimental results. 4[Fricker et al. 2012] A versatile and accurate approximation for lru cache performance Nicolas Gast – 4 / 31

slide-7
SLIDE 7

Outline of the talk

1 What are the classical models?

2000 4000 6000 8000 10000 number of requests 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 probability in cache

1 list (200) 4 lists (50/50/50/50)

2000 4000 6000 8000 10000 number of requests 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 probability in cache

approx 1 list (200) approx 4 lists (50/50/50/50)

Nicolas Gast – 5 / 31

slide-8
SLIDE 8

Outline of the talk

1 What are the classical models? 2 We introduce a family of policies for which the cache is (virtually)

divided into lists (generalization of FIFO/RANDOM)

1

We can compute in polynomial time the steady-state distribution

⋆ Disprove old conjectures.

2000 4000 6000 8000 10000 number of requests 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 probability in cache

1 list (200) 4 lists (50/50/50/50)

2000 4000 6000 8000 10000 number of requests 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 probability in cache

approx 1 list (200) approx 4 lists (50/50/50/50)

Nicolas Gast – 5 / 31

slide-9
SLIDE 9

Outline of the talk

1 What are the classical models? 2 We introduce a family of policies for which the cache is (virtually)

divided into lists (generalization of FIFO/RANDOM)

1

We can compute in polynomial time the steady-state distribution

⋆ Disprove old conjectures. 2

We develop a mean-field approximation and show that it is accurate

⋆ Fast approximation of the steady-state distribution. ⋆ We can characterize the transient behavior:

2000 4000 6000 8000 10000 number of requests 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 probability in cache

1 list (200) 4 lists (50/50/50/50)

simulation

2000 4000 6000 8000 10000 number of requests 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 probability in cache

approx 1 list (200) approx 4 lists (50/50/50/50)

ODE approximation

Nicolas Gast – 5 / 31

slide-10
SLIDE 10

Outline of the talk

1 What are the classical models? 2 We introduce a family of policies for which the cache is (virtually)

divided into lists (generalization of FIFO/RANDOM)

1

We can compute in polynomial time the steady-state distribution

⋆ Disprove old conjectures. 2

We develop a mean-field approximation and show that it is accurate

⋆ Fast approximation of the steady-state distribution. ⋆ We can characterize the transient behavior:

2000 4000 6000 8000 10000 number of requests 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 probability in cache

1 list (200) 4 lists (50/50/50/50)

simulation

2000 4000 6000 8000 10000 number of requests 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 probability in cache

approx 1 list (200) approx 4 lists (50/50/50/50)

ODE approximation

3 We provide guidelines of how to tune the parameters by using IRM

and trace-based simulation

Nicolas Gast – 5 / 31

slide-11
SLIDE 11

Outline

1

Performance models of caches

2

List-based cache replacement algorithms Steady-state performance under the IRM model Transient behavior via mean-field approximation

3

Parameters tuning and practical guidelines

4

Conclusion

Nicolas Gast – 6 / 31

slide-12
SLIDE 12

Outline

1

Performance models of caches

2

List-based cache replacement algorithms Steady-state performance under the IRM model Transient behavior via mean-field approximation

3

Parameters tuning and practical guidelines

4

Conclusion

Nicolas Gast – 7 / 31

slide-13
SLIDE 13

Our performance metric will be the hit probability

hit probability = number of items served from cache total number of items served = 1 − miss probability Goal: find a policy to maximize the hit probability.

Nicolas Gast – 8 / 31

slide-14
SLIDE 14

The offline problem is easy. . .

Application data source cache (size m)

requests

hit replace one item miss

Nicolas Gast – 9 / 31

slide-15
SLIDE 15

The offline problem is easy. . .

Application data source cache (size m)

requests

hit replace one item miss If you know the sequence of requests:

MIN policy

At time t, if Xt is not in the cache, evict an item in the cache whose next request occurs furthest in the future.

Theorem (Maston et al. 1970)

MIN is optimal

Nicolas Gast – 9 / 31

slide-16
SLIDE 16

The offline problem is easy. . . but with unbounded competitive ratio

Application data source cache (size m)

requests

hit replace one item miss

Theorem

No deterministic online algorithm for caching can achieve a better competitive ratio than m. LRU has a competitive ratio of m.

Nicolas Gast – 9 / 31

slide-17
SLIDE 17

To compare policies, we need more...

We can use trace-based simulations.

  • 5L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker. Web caching and Zipf-like distributions: Evidence and
  • implications. In INFOCOM’99, volume 1, pages 126-134. IEEE, 1999.

Nicolas Gast – 10 / 31

slide-18
SLIDE 18

To compare policies, we need more...

We can use trace-based simulations. We can model request as stochastic processes (Started with [King 1971, Gelenbe 1973])

Independent reference model (IRM)

At each time step, item i is requested with probability pi. IRM is OK for web-caching5

  • 5L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker. Web caching and Zipf-like distributions: Evidence and
  • implications. In INFOCOM’99, volume 1, pages 126-134. IEEE, 1999.

Nicolas Gast – 10 / 31

slide-19
SLIDE 19

Example: analysis of LRU: from King [71] to Che [2002]

[King 71]: Under IRM model, in steady-state, the probability of having a sequence of distinct items i1 . . . in is P(i1 . . . im) = pi1 pi2 1 − pi1 . . . pim 1 − pi1 − . . . pim−1 Hit probability is:

  • distinct sequences i1 . . . im

(pi1 + · · · + pim)P(i1 . . . im).

Nicolas Gast – 11 / 31

slide-20
SLIDE 20

Example: analysis of LRU: from King [71] to Che [2002]

[King 71]: Under IRM model, in steady-state, the probability of having a sequence of distinct items i1 . . . in is P(i1 . . . im) = pi1 pi2 1 − pi1 . . . pim 1 − pi1 − . . . pim−1 Hit probability is:

  • distinct sequences i1 . . . im

(pi1 + · · · + pim)P(i1 . . . im). [Che approximation 2002] : an item spends approximately T in the cache. P(item i in cache) ≈ 1 − e−piT, where T is such that

n

  • i=1

1 − e−piT

Nicolas Gast – 11 / 31

slide-21
SLIDE 21

Even when the popularity is constant, LFU is not optimal.

LFU is optimal under IRM (it maximizes the steady-state hit probability).

Nicolas Gast – 12 / 31

slide-22
SLIDE 22

Even when the popularity is constant, LFU is not optimal.

LFU is optimal under IRM (it maximizes the steady-state hit probability). LFU is not optimal under general distribution:

◮ e.g. time between two requests of item 1 = 1 with probability .99,

1000 with probability .01. Time between two requests of item 2 is 5. LRU outperforms LFU.

Nicolas Gast – 12 / 31

slide-23
SLIDE 23

Outline

1

Performance models of caches

2

List-based cache replacement algorithms Steady-state performance under the IRM model Transient behavior via mean-field approximation

3

Parameters tuning and practical guidelines

4

Conclusion

Nicolas Gast – 13 / 31

slide-24
SLIDE 24

I consider a cache (virtually) divided into lists

Application data source

list 1

. . .

list j list j+1

. . .

list h

IRM At each time step, item i is requested with probability pi (IRM assumption3)

  • 6L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker. Web caching and Zipf-like distributions: Evidence and
  • implications. In INFOCOM’99, volume 1, pages 126-134. IEEE, 1999.

Nicolas Gast – 14 / 31

slide-25
SLIDE 25

I consider a cache (virtually) divided into lists

Application data source

list 1

. . .

list j list j+1

. . .

list h

miss IRM At each time step, item i is requested with probability pi (IRM assumption3) MISS If item i is not in the cache, it is exchanged with a item from list 1 (FIFO or RAND).

  • 6L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker. Web caching and Zipf-like distributions: Evidence and
  • implications. In INFOCOM’99, volume 1, pages 126-134. IEEE, 1999.

Nicolas Gast – 14 / 31

slide-26
SLIDE 26

I consider a cache (virtually) divided into lists

Application data source

list 1

. . .

list j list j+1

. . .

list h

hit miss IRM At each time step, item i is requested with probability pi (IRM assumption3) MISS If item i is not in the cache, it is exchanged with a item from list 1 (FIFO or RAND). HIT If item i is list j, it is exchanged with a item from list j + 1 (FIFO or RAND).

  • 6L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker. Web caching and Zipf-like distributions: Evidence and
  • implications. In INFOCOM’99, volume 1, pages 126-134. IEEE, 1999.

Nicolas Gast – 14 / 31

slide-27
SLIDE 27

Items on higher lists are (supposedly) more popular.

list 1

. . .

list j list j+1

. . .

list h

miss hit less popular popular items cache size = m = m1 + · · · + mh These algorithms are refered to as RAND(m)and FIFO(m).

Nicolas Gast – 15 / 31

slide-28
SLIDE 28

The steady-state is a product-form distribution

Same for RAND and FIFO.

Nicolas Gast – 16 / 31

slide-29
SLIDE 29

The steady-state is a product-form distribution

Same for RAND and FIFO.

Example of a cache of size 4 with 3 lists and m = (1, 2, 1)

i j k ℓ Probability of (i, j, k, ℓ) is proportional to pi(pjpk)2(pℓ)3.

Nicolas Gast – 16 / 31

slide-30
SLIDE 30

We can compute the miss probability by using a dynamic programming approach (Generalization of [Fagin,Price]8).

We want to compute M(m) =

  • c∈Cn(m)

 

k∈c

pk   π(c) = E(m + e1, n) E(m, n) , where E(r, k) =

  • c∈Ck(r)

h

  • i=1

 

ri

  • j=1

pc(i,j)  

i

. We obtain a recursion formula on E(r, k): solvable in O(n × m1 . . . mh). The Dan and Towsley7 approximation is not needed for polynomial time.

  • 7A. Dan and D. Towsley. An approximate analysis of the LRU and FIFO buffer replacement schemes. SIGMETRICS
  • Perform. Eval. Rev., 18(1):143-152, Apr. 1990.
  • 8R. Fagin and T. G. Price. Efficient calculation of expected miss ratios in the independent reference model. SIAM J.

Comput., 7:288-296, 1978. Nicolas Gast – 17 / 31

slide-31
SLIDE 31

A higher cache size and more lists (usually) leads to a lower steady-state miss probability.

300 400 500 600 700 800 900 1000 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55

Cache size m Miss Probability n = 3000, α = 0.8

h = ∞ h = 2, m2 = m − 1 h = 3, m3 = m − 2 h = 5, m5 = m − 4 h = 10, m10 = m − 9 Lower bounds

(h = ∞ corresponds to LFU).

Nicolas Gast – 18 / 31

slide-32
SLIDE 32

Is increasing the number of lists always better9?

m1

. . . mj

mj+1 . . . mh

hit less popular popular items ?≥? Six lists: m = (1, 1, 1, 1, 1, 1) Three lists: m = (1, 1, 4).

9conjectured in 1987! O. I. Aven, E. G. Coffman, Jr., and Y. A. Kogan. Stochastic Analysis of Computer

  • Storage. Kluwer Academic Publishers, Norwell, MA, USA, 1987.

Nicolas Gast – 19 / 31

slide-33
SLIDE 33

Is increasing the number of lists always better9?

?≥? Six lists: m = (1, 1, 1, 1, 1, 1) Three lists: m = (1, 1, 4).

9conjectured in 1987! O. I. Aven, E. G. Coffman, Jr., and Y. A. Kogan. Stochastic Analysis of Computer

  • Storage. Kluwer Academic Publishers, Norwell, MA, USA, 1987.

Nicolas Gast – 19 / 31

slide-34
SLIDE 34

Outline

1

Performance models of caches

2

List-based cache replacement algorithms Steady-state performance under the IRM model Transient behavior via mean-field approximation

3

Parameters tuning and practical guidelines

4

Conclusion

Nicolas Gast – 20 / 31

slide-35
SLIDE 35

We want to study at which speed the caches fills

2000 4000 6000 8000 10000 number of requests 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 probability in cache

1 list (200) 4 lists (50/50/50/50)

simulation

2000 4000 6000 8000 10000 number of requests 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 probability in cache

1 list (200) 4 lists (50/50/50/50)

  • de aprox (1 list)
  • de approx (4 lists)

Figure: Popularities of objects change every 2000 steps.

Nicolas Gast – 21 / 31

slide-36
SLIDE 36

We want to study at which speed the caches fills

2000 4000 6000 8000 10000 number of requests 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 probability in cache

1 list (200) 4 lists (50/50/50/50)

simulation

2000 4000 6000 8000 10000 number of requests 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 probability in cache

approx 1 list (200) approx 4 lists (50/50/50/50)

ODE approx.

2000 4000 6000 8000 10000 number of requests 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 probability in cache

1 list (200) 4 lists (50/50/50/50)

  • de aprox (1 list)
  • de approx (4 lists)

Figure: Popularities of objects change every 2000 steps.

We develop an ODE approximation

Nicolas Gast – 21 / 31

slide-37
SLIDE 37

We want to study at which speed the caches fills

2000 4000 6000 8000 10000 number of requests 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 probability in cache

1 list (200) 4 lists (50/50/50/50)

simulation

2000 4000 6000 8000 10000 number of requests 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 probability in cache

1 list (200) 4 lists (50/50/50/50)

  • de aprox (1 list)
  • de approx (4 lists)

Figure: Popularities of objects change every 2000 steps.

We develop an ODE approximation We show that it is accurate

Nicolas Gast – 21 / 31

slide-38
SLIDE 38

We construct an ODE by assuming independence

Let Hi(t) be the popularity in list i.

Nicolas Gast – 22 / 31

slide-39
SLIDE 39

We construct an ODE by assuming independence

Let Hi(t) be the popularity in list i. If xk,i(t) is the probability that item k is in list i at time t, we approximately have: This is similar to a TTL approximation.

Nicolas Gast – 22 / 31

slide-40
SLIDE 40

We show that this approximation is accurate, theoretically and by simulation

2000 4000 6000 8000 10000 number of requests 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 probability in cache

1 list (200) 4 lists (50/50/50/50)

  • de aprox (1 list)
  • de approx (4 lists)

Nicolas Gast – 23 / 31

slide-41
SLIDE 41

This approximation can also be used to compute stationary distribution

Very accurate: Map is contracting: computation in O(nh), compared to O(nm1 . . . mh) for the exact.

Nicolas Gast – 24 / 31

slide-42
SLIDE 42

Outline

1

Performance models of caches

2

List-based cache replacement algorithms Steady-state performance under the IRM model Transient behavior via mean-field approximation

3

Parameters tuning and practical guidelines

4

Conclusion

Nicolas Gast – 25 / 31

slide-43
SLIDE 43

Under the IRM model, a smaller first list (usually) means a higher hit probability but a larger time to fill the cache

10

3

10

4

10

5

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

Number of Requests Hit Probability

m = 200, ODE m = 200, simul m = (100,100), ODE m = (100,100), simul m = (50,150), ODE m = (50,150), simul m = (20,180), ODE m = (20,180), simul Nicolas Gast – 26 / 31

slide-44
SLIDE 44

Under the IRM model, the time to fill the cache mainly depend on the size of the first list.

10

3

10

4

10

5

0.1 0.15 0.2 0.25 0.3 0.35 0.4

Number of Requests Hit Probability

m = (40,160), ODE m = (40,160), simul m = (40,40,120), ODE m = (40,40,120) simul m = (40,40,40,80), ODE m = (40,40,40,80), simul

In a dynamic setting, a good choice seems to be m1 ≥ m2 · · · ≥ mh with m1 “large-enough”.

Nicolas Gast – 27 / 31

slide-45
SLIDE 45

We verified on a trace of youtube videos10, that reserving at least 30% of the cache for the first list seems important.

1000 2000 3000 4000 5000 0.27 0.28 0.29 0.3 0.31 0.32 0.33

m − m1 Hit Probability FIFO m = 5000 LRU

FIFO(m): 2 lists FIFO(m): 3 lists FIFO(m): 5 lists LRU(m): 2 lists LRU(m): 3 lists LRU(m): 5 lists

  • 10M. Zink, K. Suh, Y. Gu, and J. Kurose. Characteristics of YouTube network traffic at a campus network-measurements,

models, and implications. Comput. Netw., 53(4):501-514, Mar. 2009. Nicolas Gast – 28 / 31

slide-46
SLIDE 46

Outline

1

Performance models of caches

2

List-based cache replacement algorithms Steady-state performance under the IRM model Transient behavior via mean-field approximation

3

Parameters tuning and practical guidelines

4

Conclusion

Nicolas Gast – 29 / 31

slide-47
SLIDE 47

Recap

Unified framework for studying list-based replacement policies. Steady-state miss probability in polynomial time. Accurate ODE approximation Guidelines on how to use such a replacement algorithm: the size of the first list is important.

m1

. . . mj

mj+1

. . .

mh

Two theoretical interests of this work:

◮ provides a unified framework and disproves old conjectures. ◮ ODE approximation

Future work

Network of caches? Applications?

Nicolas Gast – 30 / 31

slide-48
SLIDE 48

Thank you!

http://mescal.imag.fr/membres/nicolas.gast nicolas.gast@inria.fr Transient and Steady-state Regime of a Family of List-based Cache Replacement Algorithms. Gast, Van Houdt. ACM Sigmetrics 2015.

Nicolas Gast – 31 / 31