Efficient Miss Ratio Curve Computation for Heterogeneous Content - - PowerPoint PPT Presentation

efficient miss ratio curve computation for heterogeneous
SMART_READER_LITE
LIVE PREVIEW

Efficient Miss Ratio Curve Computation for Heterogeneous Content - - PowerPoint PPT Presentation

Efficient Miss Ratio Curve Computation for Heterogeneous Content Popularity Damiano Carra Giovanni Neglia Computer Science Dept. Universit Cte dAzur University of Verona Inria Verona, Italy Sophia Antipolis, France Context q Caches


slide-1
SLIDE 1

Efficient Miss Ratio Curve Computation for Heterogeneous Content Popularity

Giovanni Neglia

Université Côte d’Azur Inria Sophia Antipolis, France

Damiano Carra

Computer Science Dept. University of Verona Verona, Italy

slide-2
SLIDE 2

2

Context

q Caches are fundamental components of computing architectures

– Fast access to the most used items

q Shared resource

– Used by different processes or applications, with different requirements and access patterns

q Main issue

– How to divide and assign dynamically cache portions to applications?

slide-3
SLIDE 3

3

Miss Ratio Curves – MRCs

q Profiling

– Hit ratio vs amount of cache space for each application

q Use

– Compute the MRCs for each application for a given interval – Assign cache space that maximize some

  • bjective function

q Main concern: Computational complexity

slide-4
SLIDE 4

4

Current approaches

q MRC are computed from the trace q Exact computation requires:

– O(M) memory – O(logM) computational complexity

q Approximate computation

– Trade precision with space and speed

  • O(1) memory and O(1) operation per request

– Most of the solutions are based on sampling

slide-5
SLIDE 5

5

Spatial sampling

q Approach:

– Observe a randomly selected fraction R

  • f the items

– Build the MRC and scale the X-axis by 1/R

q R can be adaptive

– This allows to obtain O(1) memory and O(1) computational complexity

slide-6
SLIDE 6

6

Sampling bias

q Spatial sampling biased if requests rates are highly heterogenous across objects

– The sample may or may not include some

  • bjects that are crucial for the MRC

q Solutions to such bias focus on the MRC tail

– A correction factor accounts for the difference between the expected and actual average

  • bserved requests
slide-7
SLIDE 7

7

Experiments on synthetic traces

q IRM traces, with Zipf-distributed object popularity

– Various ⍺ (Zipf parameter)

0.2 0.4 0.6 0.8 1 1.2 100 101 102 103 104 105 106 107 Miss ratio Cache size (num. of items) α = 0.8 MRC sample1 sample2 sample3 sample4 0.2 0.4 0.6 0.8 1 1.2 100 101 102 103 104 105 106 107 Miss ratio Cache size (num. of items) α = 1.2 MRC sample1 sample2 sample3 sample4

slide-8
SLIDE 8

8

[Detour] How to measure the accuracy?

q Mean Absolute Error (MAE)

– Average of the absolute difference between the exact and approximate MRC

q Main issue à All sizes have the same weight

Miss ratio Cache size (num. of items) ms-ex approximate MRC exact MRC 0.2 0.4 0.6 0.8 1 100 101 102 103 104 105 106 Miss ratio Cache size (num. of items x 106) ms-ex approximate MRC exact MRC 0.2 0.4 0.6 0.8 1 0.5 1 1.5 2 2.5

slide-9
SLIDE 9

9

[Detour] MAE per Quantile à MAEQ

q Divide the Y-axis into intervals

– And determine the corresponding intervals

  • n the X-axis

q Compute the MAEi within each interval i q MAEQ = average MAE over all intervals q In this example:

– MAE = 0.025 – MAEQ = 0.09

Miss ratio Cache size (num. of items) ms-ex approximate MRC exact MRC 0.2 0.4 0.6 0.8 1 100 101 102 103 104 105 106

slide-10
SLIDE 10

10

Accuracy with synthetic traces

q As ⍺ increases (high variability in the popularities) à the error increases

q Does the error depend on the tail of the popularity distribution?

10-4 10-3 10-2 10-1 100 0.6 0.8 1.0 1.2 0.6p Average error (MAEQ) Parameter α of the Zipf R = 0.1 R = 0.01 R = 0.001

slide-11
SLIDE 11

11

The role of popular objects

q Case with ⍺ = 0.6 à We add 20 popular objects q The results are confirmed by a model (see the paper)

10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 100 101 102 103 104 105 request freq. Item ID IRM Zipf, 1.2 Zipf, 0.6 0.6p 0.2 0.4 0.6 0.8 1 1.2 1.4 100 101 102 103 104 105 106 107 Miss ratio Cache size (num. of items) α = 0.6, with popular items MRC sample1 sample2 sample3 sample4

slide-12
SLIDE 12

12

Our solution: Key idea

q Small cache sizes depend mostly on the highly popular items q Large cache sizes can be built from sample q Our approach à Combine exact and approximate MRCs

– Can adopt a “constant sampling rate” or “constant complexity” approach

A B C D B

Requests hits size 1 size B 1 size 1 size Reuse Distance Histogram Exact From samples hits size Exact MRC Approximate MRC Full MRC B B

1 R 1 R

B

slide-13
SLIDE 13

13

Experimental methodology

q Comparison with the state-of-the-art solutions

– We set the parameters aiming at fair comparison

  • Use the same amount of memory

q CPU overheads

– With our scheme, the CPU usage is on average 10% higher than state-of-the-art solutions

slide-14
SLIDE 14

14

Results: IRM

q Accurate for any size

– MAEQ always below 1%

10-4 10-3 10-2 10-1 100 0.6 0.8 1 1.2 0.6p Average error (MAEQ) Parameter α of the Zipf R = 0.1 R = 0.01 R = 0.001

0.2 0.4 0.6 0.8 1 100 101 102 103 104 105 106 107 Miss ratio Cache size (num. of items) α = 0.6, with popular items MRC sample1 sample2 sample3 sample4 0.2 0.4 0.6 0.8 1 100 101 102 103 104 105 106 107 Miss ratio Cache size (num. of items) α = 1.2 MRC sample1 sample2 sample3 sample4

slide-15
SLIDE 15

15

Results: IRM, sensitivity to B

q Error mainly where the curves join

10-4 10-3 10-2 10-1 100 1.2 0.6p Average error (MAEQ) Parameter α of the Zipf B = 0 (SHARDSadj) B = 32 B = 64 B = 125 B = 250 B = 500

0.7 0.75 0.8 0.85 0.9 0.95 1 100 101 102 103 104 105 Miss ratio Cache size (num. of items) α = 0.6, with popular items MRC B = 32 B = 64 B = 125 B = 250 B = 500

slide-16
SLIDE 16

16

Real-world traces

q Traces from different sources, with different characteristics

10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 101 102 103 104 105 α = 0.7 α = 1.3 request freq. Item ID CDN 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 101 102 103 104 105 α = 1.1 request freq. Item ID systor

slide-17
SLIDE 17

17

Real-world traces: results

0.2 0.4 0.6 0.8 1 100 101 102 103 104 105 106 Miss ratio Cache size (num. of items) systor MRC sample1 sample2 sample3 sample4 0.2 0.4 0.6 0.8 1 100 101 102 103 104 105 106 Miss ratio Cache size (num. of items) systor MRC sample1 sample2 sample3 sample4 0.2 0.4 0.6 0.8 1 100 101 102 103 104 105 106 Miss ratio Cache size (num. of items) CDN MRC sample1 sample2 sample3 sample4 0.2 0.4 0.6 0.8 1 100 101 102 103 104 105 106 Miss ratio Cache size (num. of items) CDN MRC sample1 sample2 sample3 sample4

10-4 10-3 10-2 10-1 100 fiu ms-exms-dev systor CDN Average error (MAEQ) Trace ID SHARDSadj

  • ur mixed approach
slide-18
SLIDE 18

18

Extension (1/2)

q “Non-stack” algorithms

– Eviction policies that do not satisfy the inclusion property – Need to compute the MRC by points – Our approach:

  • use high R with small cache sizes, decrease R as

the cache size increases

slide-19
SLIDE 19

19

Extension (2/2)

q Heterogenous item size

– Web caches deal with items with heterogenous size – Can we build the MRC in such a case?

à Order statistics tree

– Does sampling work in this case?

Miss ratio Cache size (MB) CDN, het. sizes MRC sample1 sample2 sample3 sample4 0.2 0.4 0.6 0.8 1 100 101 102 103 104 105 106 Miss ratio Cache size (MB) CDN, het. sizes MRC sample1 sample2 sample3 sample4 0.2 0.4 0.6 0.8 1 100 101 102 103 104 105 106

slide-20
SLIDE 20

20

Conclusions and perspectives

q Build a MRC from samples requires a careful design

– Highly popular items have a significant impact – Instead of the tail of the distribution, the head is important

q In our approach we combine an exact MRC with an approximate one

– Improving the accuracy of the final result

q Future works

– Online adaptation of the scheme parameters

slide-21
SLIDE 21

Thanks!

For any question, you can reach me at