RIPQ: Advanced Photo Caching on Flash for Facebook Linpeng Tang - - PowerPoint PPT Presentation

ripq advanced photo caching on flash for facebook
SMART_READER_LITE
LIVE PREVIEW

RIPQ: Advanced Photo Caching on Flash for Facebook Linpeng Tang - - PowerPoint PPT Presentation

RIPQ: Advanced Photo Caching on Flash for Facebook Linpeng Tang (Princeton) Qi Huang (Cornell & Facebook) Wyatt Lloyd (USC & Facebook) Sanjeev Kumar (Facebook) Kai Li (Princeton) 1 2 Billion * Photos Photo


slide-1
SLIDE 1

RIPQ: Advanced Photo Caching

  • n Flash for Facebook

Linpeng Tang (Princeton)

Qi Huang (Cornell & Facebook) Wyatt Lloyd (USC & Facebook) Sanjeev Kumar (Facebook) Kai Li (Princeton)

1 ¡

slide-2
SLIDE 2

2 ¡

2 ¡

* Facebook 2014 Q4 Report

Photo Serving Stack 2 Billion* Photos Shared Daily

Storage Backend

slide-3
SLIDE 3

3 ¡

Photo Caches

Close to users Reduce backbone traffic Co-located with backend Reduce backend IO

Flash

Storage Backend Edge Cache Origin Cache

Photo Serving Stack

slide-4
SLIDE 4

4 ¡

Flash

Storage Backend Edge Cache Origin Cache

Photo Serving Stack

An Analysis of Facebook Photo Caching [Huang et al. SOSP’13] Segmented LRU-3: 10% less backbone traffic Greedy-Dual-Size-Frequency-3: 23% fewer backend IOs

Advanced caching algorithms help!

slide-5
SLIDE 5

5 ¡

Flash

FIFO was still used

  • No known way to implement

advanced algorithms efficiently

Storage Backend Edge Cache Origin Cache

In Practice Photo Serving Stack

slide-6
SLIDE 6

6 ¡

Advanced caching helps:

  • 23% fewer backend IOs
  • 10% less backbone traffic

Theory Practice

Difficult to implement on flash:

  • FIFO still used

Restricted Insertion Priority Queue: efficiently implement advanced caching algorithms on flash

slide-7
SLIDE 7

Outline

  • Why are advanced caching algorithms

difficult to implement on flash efficiently?

  • How RIPQ solves this problem?

– Why use priority queue? – How to efficiently implement one on flash?

  • Evaluation

– 10% less backbone traffic – 23% fewer backend IOs

7 ¡

slide-8
SLIDE 8

Outline

  • Why are advanced caching algorithms

difficult to implement on flash efficiently?

– Write pattern of FIFO and LRU

  • How RIPQ solves this problem?

– Why use priority queue? – How to efficiently implement one on flash?

  • Evaluation

– 10% less backbone traffic – 23% fewer backend IOs

8 ¡

slide-9
SLIDE 9

FIFO Does Sequential Writes

9 ¡

Cache space of FIFO Head Tail

slide-10
SLIDE 10

FIFO Does Sequential Writes

10 ¡

Cache space of FIFO Head Tail

Miss

slide-11
SLIDE 11

FIFO Does Sequential Writes

11 ¡

Cache space of FIFO Head Tail

Hit

slide-12
SLIDE 12

FIFO Does Sequential Writes

12 ¡

Cache space of FIFO Head Tail

Evicted

No random writes needed for FIFO

slide-13
SLIDE 13

LRU Needs Random Writes

13 ¡

Cache space of LRU Head Tail

Locations on flash ≠ Locations in LRU queue Hit

slide-14
SLIDE 14

LRU Needs Random Writes

14 ¡

Head Tail Non-contiguous

  • n flash

Random writes needed to reuse space

Cache space of LRU

slide-15
SLIDE 15

Why Care About Random Writes?

  • Write-heavy workload

– Long tail access pattern, moderate hit ratio – Each miss triggers a write to cache

  • Small random writes are harmful for flash

– e.g. Min et al. FAST’12 – High write amplification

15 ¡

Low write throughput

  • Short device lifetime
slide-16
SLIDE 16

What write size do we need?

  • Large writes

– High write throughput at high utilization – 16~32MiB in Min et al. FAST’2012

  • What’s the trend since then?

– Random writes tested for 3 modern devices – 128~512MiB needed now

16 ¡

100MiB+ writes needed for efficiency

slide-17
SLIDE 17

Outline

  • Why are advanced caching algorithms

difficult to implement on flash efficiently?

  • How RIPQ solves this problem?
  • Evaluation

17 ¡

slide-18
SLIDE 18

RIPQ Architecture

(Restricted Insertion Priority Queue)

18 ¡

Advanced Caching Policy (SLRU, GDSF …) RIPQ Priority Queue API RAM Flash Flash-friendly Workloads Approximate Priority Queue

Efficient caching

  • n flash ¡

Caching algorithms approximated as well ¡

slide-19
SLIDE 19

RIPQ Architecture

(Restricted Insertion Priority Queue)

19 ¡

Advanced Caching Policy (SLRU, GDSF …) RIPQ Priority Queue API RAM Flash Flash-friendly Workloads Approximate Priority Queue

Restricted insertion Section merge/split ¡ Large writes Lazy updates ¡

slide-20
SLIDE 20

Priority Queue API

  • No single best caching policy
  • Segmented LRU [Karedla’94]

– Reduce both backend IO and backbone traffic – SLRU-3: best algorithm for Edge so far

  • Greedy-Dual-Size-Frequency [Cherkasova’98]

– Favor small objects – Further reduces backend IO – GDSF-3: best algorithm for Origin so far

20 ¡

slide-21
SLIDE 21

Segmented LRU

  • Concatenation of K LRU caches

21 ¡

Cache space of SLRU-3 Head L2 L1 Tail L3

Miss

slide-22
SLIDE 22

Segmented LRU

  • Concatenation of K LRU caches

22 ¡

Head L2 L1 Tail L3

Miss

Cache space of SLRU-3

slide-23
SLIDE 23

Segmented LRU

  • Concatenation of K LRU caches

23 ¡

Cache space of SLRU-3 Head L2 L1 Tail L3

Hit

slide-24
SLIDE 24

Segmented LRU

  • Concatenation of K LRU caches

24 ¡

Cache space of SLRU-3 Head L2 L1 Tail L3

Hit again

slide-25
SLIDE 25

Greedy-Dual-Size-Frequency

  • Favoring small objects

25 ¡

Cache space of GDSF-3 Head Tail

slide-26
SLIDE 26

Greedy-Dual-Size-Frequency

  • Favoring small objects

26 ¡

Cache space of GDSF-3 Head Tail

Miss

slide-27
SLIDE 27

Greedy-Dual-Size-Frequency

  • Favoring small objects

27 ¡

Cache space of GDSF-3 Head Tail

Miss

slide-28
SLIDE 28

Greedy-Dual-Size-Frequency

  • Favoring small objects

28 ¡

Cache space of GDSF-3 Head

  • Write workload more random than LRU
  • Operations similar to priority queue

Tail

slide-29
SLIDE 29

Relative Priority Queue for Advanced Caching Algorithms

29 ¡

Cache space Head Tail 1.0 0.0

Miss object: insert(x, p)

p ¡

slide-30
SLIDE 30

Relative Priority Queue for Advanced Caching Algorithms

30 ¡

Cache space Head Tail 1.0 0.0

Hit object: increase(x, p’)

p’ ¡

slide-31
SLIDE 31

Relative Priority Queue for Advanced Caching Algorithms

31 ¡

Cache space Head Tail 1.0 0.0

Implicit demotion on insert/increase:

  • Object with lower priorities

moves towards the tail

slide-32
SLIDE 32

Relative Priority Queue for Advanced Caching Algorithms

32 ¡

Cache space Head Tail 1.0 0.0

Evict from queue tail

Evicted

Relative priority queue captures the dynamics of many caching algorithms!

slide-33
SLIDE 33

RIPQ Design: Large Writes

33 ¡

  • Need to buffer object writes (10s KiB) into block writes
  • Once written, blocks are immutable!
  • 256MiB block size, 90% utilization
  • Large caching capacity
  • High write throughput
slide-34
SLIDE 34

RIPQ Design: Restricted Insertion Points

34 ¡

  • Exact priority queue
  • Insert to any block in the queue
  • Each block needs a separate buffer
  • Whole flash space buffered in RAM!
slide-35
SLIDE 35

RIPQ Design: Restricted Insertion Points

35 ¡

Solution: restricted insertion points

slide-36
SLIDE 36

Section is Unit for Insertion

36 ¡

1 .. 0.6 0.6 .. 0.35 0.35 .. 0

Active block with RAM buffer Sealed block

  • n flash

Head Tail

Each section has one insertion point

Section Section Section

slide-37
SLIDE 37

Section is Unit for Insertion

37 ¡

Head Tail +1 insert(x, 0.55)

1 .. 0.6 0.6 .. 0.35 0.35 .. 0 1 .. 0.62 0.62 .. 0.33 0.33 .. 0

Insert procedure

  • Find corresponding section
  • Copy data into active block
  • Updating section priority range
  • Section

Section Section

slide-38
SLIDE 38

1 .. 0.62 0.62 .. 0.33 0.33 .. 0

Section is Unit for Insertion

38 ¡

Active block with RAM buffer Sealed block

  • n flash

Head Tail

Relative orders within one section not guaranteed!

Section Section Section

slide-39
SLIDE 39

Trade-off in Section Size

39 ¡

Section size controls approximation error

  • Sections , approximation error
  • Sections , RAM buffer

Head Tail

1 .. 0.62 0.62 .. 0.33 0.33 .. 0 Section Section Section

slide-40
SLIDE 40

RIPQ Design: Lazy Update

40 ¡

Head Tail increase(x, 0.9)

Problem with naïve approach

  • Data copying/duplication on flash

x +1 Naïve approach: copy to the corresponding active block

Section Section Section

slide-41
SLIDE 41

RIPQ Design: Lazy Update

41 ¡

Head Tail

Solution: use virtual block to track the updated location!

Section Section Section

slide-42
SLIDE 42

RIPQ Design: Lazy Update

42 ¡

Head Tail Virtual Blocks

Solution: use virtual block to track the updated location!

Section Section Section

slide-43
SLIDE 43

Virtual Block Remembers Update Location

43 ¡

Head Tail

No data written during virtual update

increase(x, 0.9) x +1

Section Section Section

slide-44
SLIDE 44

Actual Update During Eviction

44 ¡

Head Tail x

Section Section Section

x now at tail block.

slide-45
SLIDE 45

Actual Update During Eviction

45 ¡

Head Tail

  • 1

+1 x Copy data to the active block

Always one copy of data on flash

Section Section Section

slide-46
SLIDE 46

RIPQ Design

  • Relative priority queue API
  • RIPQ design points

– Large writes – Restricted insertion points – Lazy update – Section merge/split

  • Balance section sizes and RAM buffer usage
  • Static caching

– Photos are static

46 ¡

slide-47
SLIDE 47

Outline

  • Why are advanced caching algorithms

difficult to implement on flash efficiently?

  • How RIPQ solves this problem?
  • Evaluation

47 ¡

slide-48
SLIDE 48

Evaluation Questions

  • How much RAM buffer needed?
  • How good is RIPQ’s approximation?
  • What’s the throughput of RIPQ?
  • 48 ¡
slide-49
SLIDE 49

Evaluation Approach

  • Real-world Facebook workloads

– Origin – Edge

  • 670 GiB flash card

– 256MiB block size – 90% utilization

  • Baselines

– FIFO – SIPQ: Single Insertion Priority Queue

49 ¡

slide-50
SLIDE 50

RIPQ Needs Small Number of Insertion Points

Insertion points

50 ¡

Object-wise hit-ratio (%) ¡

25 ¡ 30 ¡ 35 ¡ 40 ¡ 45 ¡ 2 ¡ 4 ¡ 8 ¡ 16 ¡ 32 ¡ Exact ¡GDSF-­‑3 ¡ GDSF-­‑3 ¡ Exact ¡SLRU-­‑3 ¡ SLRU-­‑3 ¡ FIFO ¡

+6% ¡ +16% ¡

slide-51
SLIDE 51

RIPQ Needs Small Number of Insertion Points

51 ¡

Object-wise hit-ratio (%) ¡

25 ¡ 30 ¡ 35 ¡ 40 ¡ 45 ¡ 2 ¡ 4 ¡ 8 ¡ 16 ¡ 32 ¡ Exact ¡GDSF-­‑3 ¡ GDSF-­‑3 ¡ Exact ¡SLRU-­‑3 ¡ SLRU-­‑3 ¡ FIFO ¡

Insertion points

slide-52
SLIDE 52

RIPQ Needs Small Number of Insertion Points

52 ¡

Object-wise hit-ratio (%) ¡

25 ¡ 30 ¡ 35 ¡ 40 ¡ 45 ¡ 2 ¡ 4 ¡ 8 ¡ 16 ¡ 32 ¡ Exact ¡GDSF-­‑3 ¡ GDSF-­‑3 ¡ Exact ¡SLRU-­‑3 ¡ SLRU-­‑3 ¡ FIFO ¡

You don’t need much RAM buffer (2GiB)!

Insertion points

slide-53
SLIDE 53

RIPQ Has High Fidelity

53 ¡

Object-wise hit-ratio (%)

¡ ¡ ¡ ¡ 20 ¡ 25 ¡ 30 ¡ 35 ¡ 40 ¡ 45 ¡ SLRU-­‑1 ¡ SLRU-­‑2 ¡ SLRU-­‑3 ¡GDSF-­‑1 ¡GDSF-­‑2 ¡GDSF-­‑3 ¡ FIFO ¡ Exact ¡ RIPQ ¡ FIFO ¡ ¡ ¡

slide-54
SLIDE 54

RIPQ Has High Fidelity

54 ¡

Object-wise hit-ratio (%)

¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ 20 ¡ 25 ¡ 30 ¡ 35 ¡ 40 ¡ 45 ¡ SLRU-­‑1 ¡ SLRU-­‑2 ¡ SLRU-­‑3 ¡GDSF-­‑1 ¡GDSF-­‑2 ¡GDSF-­‑3 ¡ FIFO ¡ Exact ¡ RIPQ ¡ FIFO ¡

slide-55
SLIDE 55

RIPQ Has High Fidelity

55 ¡

Object-wise hit-ratio (%)

20 ¡ 25 ¡ 30 ¡ 35 ¡ 40 ¡ 45 ¡ SLRU-­‑1 ¡ SLRU-­‑2 ¡ SLRU-­‑3 ¡GDSF-­‑1 ¡GDSF-­‑2 ¡GDSF-­‑3 ¡ FIFO ¡ Exact ¡ RIPQ ¡ FIFO ¡

RIPQ achieves ≤0.5% difference for all algorithms

slide-56
SLIDE 56

RIPQ Has High Fidelity

56 ¡

Object-wise hit-ratio (%)

20 ¡ 25 ¡ 30 ¡ 35 ¡ 40 ¡ 45 ¡ SLRU-­‑1 ¡ SLRU-­‑2 ¡ SLRU-­‑3 ¡GDSF-­‑1 ¡GDSF-­‑2 ¡GDSF-­‑3 ¡ FIFO ¡ Exact ¡ RIPQ ¡ FIFO ¡

+16% hit-ratio è 23% fewer backend IOs +16% ¡

slide-57
SLIDE 57

RIPQ Has High Throughput

57 ¡

Throughput (req./sec)

RIPQ throughput comparable to FIFO (≤10% diff.)

0 ¡ 5000 ¡ 10000 ¡ 15000 ¡ 20000 ¡ 25000 ¡ 30000 ¡ SLRU-­‑1 ¡ SLRU-­‑2 ¡ SLRU-­‑3 ¡ GDSF-­‑1 ¡ GDSF-­‑2 ¡ GDSF-­‑3 ¡ RIPQ ¡ FIFO ¡

slide-58
SLIDE 58

Related Works

RAM-based advanced caching SLRU(Karedla’94), ¡GDSF(Young’94, ¡Cao’97, ¡Cherkasova’01), ¡ ¡ SIZE(Abrams’96), ¡LFU(Maffeis’93), ¡LIRS ¡(Jiang’02), ¡… Flash-based caching solutions Facebook ¡FlashCache, ¡Janus(Albrecht ¡’13), ¡Nitro(Li’13), ¡ ¡ OP-­‑FCL(Oh’12), ¡FlashTier(Saxena’12), ¡Hec(Yang’13), ¡… ¡ Flash performance Stoica’09, ¡Chen’09, ¡Bouganim’09, ¡Min’12, ¡… ¡

58 ¡

RIPQ enables their use on flash RIPQ supports advanced algorithms Trend continues for modern flash cards

slide-59
SLIDE 59

RIPQ

  • First framework for advanced caching on flash

– Relative priority queue interface – Large writes – Restricted insertion points – Lazy update – Section merge/split

  • Enables SLRU-3 & GDSF-3 for Facebook photos

– 10% less backbone traffic – 23% fewer backend IOs

59 ¡