Faster Slab Reassignment in memcached Daniel Byrne Nilufer Onder - - PowerPoint PPT Presentation

faster slab reassignment in memcached
SMART_READER_LITE
LIVE PREVIEW

Faster Slab Reassignment in memcached Daniel Byrne Nilufer Onder - - PowerPoint PPT Presentation

Faster Slab Reassignment in memcached Daniel Byrne Nilufer Onder Zhenlin Wang djbyrne@mtu.edu nilufer@mtu.edu zlwang@mtu.edu Department of Computer Science Michigan Technological University MEMSYS 2019 1/29 Byrne, Onder, Wang; MEMSYS 2019


slide-1
SLIDE 1

Faster Slab Reassignment in memcached

Daniel Byrne

djbyrne@mtu.edu

Nilufer Onder

nilufer@mtu.edu

Zhenlin Wang

zlwang@mtu.edu

Department of Computer Science Michigan Technological University

MEMSYS 2019

1/29 Byrne, Onder, Wang; MEMSYS 2019

slide-2
SLIDE 2

Background memcached Faster Slab Reassignment Experimental Evaluation Conclusion

2/29 Byrne, Onder, Wang; MEMSYS 2019

slide-3
SLIDE 3

Cache Data from Backend Systems

DB RecSys AdSrv

memcached

web image msg 100 µs 10,000 µs

3/29 Byrne, Onder, Wang; MEMSYS 2019

slide-4
SLIDE 4

Cache Miss Ratio Drives Performance

Access Time (µs) System Miss Ratio Cache Backend End-to-End A 2% 100 10,000 298 B 1% 100 10,000 199

4/29 Byrne, Onder, Wang; MEMSYS 2019

slide-5
SLIDE 5

Cache Miss Ratio Drives Performance

Access Time (µs) System Miss Ratio Cache Backend End-to-End A 2% 100 10,000 298 B 1% 100 10,000 199 2% → 1% in miss ratio → 33% decrease in end-to-end latency!

4/29 Byrne, Onder, Wang; MEMSYS 2019

slide-6
SLIDE 6

Background memcached Faster Slab Reassignment Experimental Evaluation Conclusion

5/29 Byrne, Onder, Wang; MEMSYS 2019

slide-7
SLIDE 7

memcached

◮ memcached resides in server’s main memory ◮ Data is stored as: key,value pair ◮ To retrieve an item: GET key ◮ To store/update an item: SET key value ◮ Deployed in many large scale datacenters

6/29 Byrne, Onder, Wang; MEMSYS 2019

slide-8
SLIDE 8

memcached Memory Organization

Class 1

Slab 1 Slab 2 Slab N

...

Class N

Slab 1 Slab 2 Slab N

... …

LRU HEAD LRU TAIL

◮ A class is a collection of slabs that contain items ◮ Each class corresponds to items of a given size range ◮ Each class maintains its own LRU queue

7/29 Byrne, Onder, Wang; MEMSYS 2019

slide-9
SLIDE 9

Why Reassign Slabs Among Classes?

◮ Adapt to changes in an application’s workload

◮ Working set sizes can change over time ◮ An application may change its item size distribution

◮ Dynamically reassign slabs when new applications enter the

cache

◮ Miss ratio curves can be used to find optimal allocation

among classes

◮ LAMA, ATC ’15 ◮ mPart, ISMM ’18

◮ Our work focuses on the process of reassigning slabs from

  • ne class to another

8/29 Byrne, Onder, Wang; MEMSYS 2019

slide-10
SLIDE 10

Adapting to a New Class of Items

◮ Two-Phase Workload ◮ Fix the total memory size at 384MB

phase % class 1 % class 2 reqs 1 100 13 million 2 33 67 87 million

9/29 Byrne, Onder, Wang; MEMSYS 2019

slide-11
SLIDE 11

Default Slab Reassignment

class1 class2

25 50 75 100 100 200 300 400 100 200 300 400

GET Requests (Millions) Slabs Speed

default

phase 1 phase 2

◮ 65 million requests to reassign over 200 slabs

10/29 Byrne, Onder, Wang; MEMSYS 2019

slide-12
SLIDE 12

Impact on Overall Miss Ratio

0.2 0.3 0.4 25 50 75 100 GET Requests (Millions) Miss Ratio Slab Speed

default

◮ 60 million requests to reach steady state miss ratio

11/29 Byrne, Onder, Wang; MEMSYS 2019

slide-13
SLIDE 13

Current Reassignment Algorithm

◮ Goal: Reassign a slab from class 1 to class 2

Class 1

Slab 1 Slab 2

Class 2

Slab 1 Slab 2 Slab 1 - Class 1

Slab reassignment thread

12/29 Byrne, Onder, Wang; MEMSYS 2019

slide-14
SLIDE 14

Current Reassignment Algorithm

◮ Goal: Reassign a slab from class 1 to class 2

Class 1

Slab 1 Slab 2

Class 2

Slab 1 Slab 2 Slab 1 - Class 1

Slab reassignment thread

12/29 Byrne, Onder, Wang; MEMSYS 2019

slide-15
SLIDE 15

Current Reassignment Algorithm

  • 1. Acquire item lock

Slab 1 - Class 1

  • 1. Acquire item lock

13/29 Byrne, Onder, Wang; MEMSYS 2019

slide-16
SLIDE 16

Current Reassignment Algorithm

  • 1. Acquire item lock
  • 2. Check that no other threads reference this item

Slab 1 - Class 1

  • 2. Check how many other threads reference this item

13/29 Byrne, Onder, Wang; MEMSYS 2019

slide-17
SLIDE 17

Current Reassignment Algorithm

  • 1. Acquire item lock
  • 2. Check that no other threads reference this item
  • 3. Unlink the item from the LRU queue

Slab 1 - Class 1

  • 3. Unlink the item from the LRU queue

13/29 Byrne, Onder, Wang; MEMSYS 2019

slide-18
SLIDE 18

Current Reassignment Algorithm

  • 1. Acquire item lock
  • 2. Check that no other threads reference this item
  • 3. Unlink the item from the LRU queue
  • 4. Free the item

Slab 1 - Class 1

  • 4. Free the item

13/29 Byrne, Onder, Wang; MEMSYS 2019

slide-19
SLIDE 19

Current Reassignment Algorithm

  • 1. Acquire item lock
  • 2. Check that no other threads reference this item
  • 3. Unlink the item from the LRU queue
  • 4. Free the item
  • 5. Mark the item as was busy

Slab 1 - Class 1

  • 5. Mark the item as was busy and wait ~1ms

13/29 Byrne, Onder, Wang; MEMSYS 2019

slide-20
SLIDE 20

Current Reassignment Algorithm

  • 1. Acquire item lock
  • 2. Check that no other threads reference this item
  • 3. Unlink the item from the LRU queue
  • 4. Free the item
  • 5. Mark the item as was busy
  • 6. Repeat for next item

Slab 1 - Class 1

  • 6. Repeat for next item

13/29 Byrne, Onder, Wang; MEMSYS 2019

slide-21
SLIDE 21

Current Reassignment Algorithm

  • 1. Acquire item lock
  • 2. Check that no other threads reference this item
  • 3. Unlink the item from the LRU queue
  • 4. Free the item
  • 5. Mark the item as was busy
  • 6. Repeat for next item
  • 7. Return to head

Slab 1 - Class 1

  • 7. Return to head

13/29 Byrne, Onder, Wang; MEMSYS 2019

slide-22
SLIDE 22

Current Reassignment Algorithm

  • 1. Acquire item lock
  • 2. Check that no other threads reference this item
  • 3. Unlink the item from the LRU queue
  • 4. Free the item
  • 5. Mark the item as was busy
  • 6. Repeat for next item
  • 7. Return to head
  • 8. Remove item slot from class freelist

Slab 1 - Class 1

  • 8. Remove item slot from class freelist

13/29 Byrne, Onder, Wang; MEMSYS 2019

slide-23
SLIDE 23

Current Reassignment Algorithm

  • 1. Acquire item lock
  • 2. Check that no other threads reference this item
  • 3. Unlink the item from the LRU queue
  • 4. Free the item
  • 5. Mark the item as was busy
  • 6. Repeat for next item
  • 7. Return to head
  • 8. Remove item slot from class freelist
  • 9. Assign to class 2

Slab 3 - Class 2

  • 9. Assign to class 2

13/29 Byrne, Onder, Wang; MEMSYS 2019

slide-24
SLIDE 24

What Slows Down Reassignment?

◮ Each was busy causes the thread to sleep

◮ Slab reassign thread detected that the item was in use and

cannot be cut from the class’s freelist at this moment

◮ During thread sleep, an item can be allocated to the item slot

Slab 1 - Class 1

New item gets allocated to slot

14/29 Byrne, Onder, Wang; MEMSYS 2019

slide-25
SLIDE 25

What Slows Down Reassignment?

◮ Each was busy causes the thread to sleep

◮ Slab reassign thread detected that the item was in use and

cannot be cut from the class’s freelist at this moment

◮ During thread sleep, an item can be allocated to the item slot

Slab 1 - Class 1

Have to free and unlink again!

14/29 Byrne, Onder, Wang; MEMSYS 2019

slide-26
SLIDE 26

Sleeping for a Shorter Period

algorithm sleep interval (µs) slabs/s default 1 36.93 default 10 35.5 default 100 26.55 default 1000 4.12

◮ Moving hundreds of slabs still requires several seconds of

waiting on the slab reassingment thread to complete

15/29 Byrne, Onder, Wang; MEMSYS 2019

slide-27
SLIDE 27

Background memcached Faster Slab Reassignment Experimental Evaluation Conclusion

16/29 Byrne, Onder, Wang; MEMSYS 2019

slide-28
SLIDE 28

Faster Slab Reassignment

◮ Remove the items immediately from the class’s freelist

◮ Removes was busy waiting on recently freed items ◮ Stops items from being allocated to recently freed slots 17/29 Byrne, Onder, Wang; MEMSYS 2019

slide-29
SLIDE 29

Faster Slab Reassignment Algorithm

  • 1. Acquire item lock
  • 2. Check that no other threads reference this item
  • 3. Unlink the item from the LRU queue
  • 4. Free the item

Slab 1 - Class 1

  • 4. Free the item

18/29 Byrne, Onder, Wang; MEMSYS 2019

slide-30
SLIDE 30

Faster Slab Reassignment Algorithm

  • 1. Acquire item lock
  • 2. Check that no other threads reference this item
  • 3. Unlink the item from the LRU queue
  • 4. Free the item
  • 5. Remove item slot from class freelist

Slab 1 - Class 1

  • 5. Remove item from freelist

18/29 Byrne, Onder, Wang; MEMSYS 2019

slide-31
SLIDE 31

Faster Slab Reassignment Algorithm

  • 1. Acquire item lock
  • 2. Check that no other threads reference this item
  • 3. Unlink the item from the LRU queue
  • 4. Free the item
  • 5. Remove item slot from class freelist
  • 6. Repeat for next item

Slab 1 - Class 1

  • 6. Repeat for each item

18/29 Byrne, Onder, Wang; MEMSYS 2019

slide-32
SLIDE 32

Faster Slab Reassignment Algorithm

  • 1. Acquire item lock
  • 2. Check that no other threads reference this item
  • 3. Unlink the item from the LRU queue
  • 4. Free the item
  • 5. Remove item slot from class freelist
  • 6. Repeat for next item
  • 7. Assign to class 2

Slab 3 - Class 2

  • 7. Assign to class 2

18/29 Byrne, Onder, Wang; MEMSYS 2019

slide-33
SLIDE 33

Background memcached Faster Slab Reassignment Experimental Evaluation Conclusion

19/29 Byrne, Onder, Wang; MEMSYS 2019

slide-34
SLIDE 34

Experimental Setup

◮ Implemented fast slab reassignment algorithm in

memcached

◮ Use miss ratio curve partitioning to assign memory among

classes

◮ 2 different workloads - See paper for multi-tenant evaluation

◮ Two-Phase ◮ Time-Varying

◮ Record the overall miss ratio and slab assignments over

entire trace

20/29 Byrne, Onder, Wang; MEMSYS 2019

slide-35
SLIDE 35

Slab Movement in Two-Phase Workload

class1 class2

25 50 75 100 100 200 300 400 100 200 300 400

GET Requests (Millions) Slabs Speed

default fast

phase 1 phase 2

◮ Over 95% reduction in the time needed to reallocate slabs

21/29 Byrne, Onder, Wang; MEMSYS 2019

slide-36
SLIDE 36

Miss Ratio in Two-Phase Workload

0.2 0.3 0.4 25 50 75 100 GET Requests (Millions) Miss Ratio Slab Speed

default fast

◮ Over 95% reduction time to steady state ◮ 11.5% improvement in the mean miss ratio

22/29 Byrne, Onder, Wang; MEMSYS 2019

slide-37
SLIDE 37

CPU Usage vs. Slab Move Rate in Two-Phase Workload

algorithm sleep interval (µs) cpu usage % slabs/s default 1 13.8 36.93 default 10 10.6 35.5 default 100 8.0 26.55 default 1000 3.1 4.12 fast 1000 97.0 252.31

◮ Tradeoff: Increased CPU usage for significantly faster

reassignment

◮ Allows the reassingment thread to complete execution

sooner

23/29 Byrne, Onder, Wang; MEMSYS 2019

slide-38
SLIDE 38

Time-Varying Workload

◮ Simulates two applications over an entire 24 hour period

following the request rate distribution at Facebook

◮ Each application sends 2.16B requests totally, drawing from

7 million unique items

24/29 Byrne, Onder, Wang; MEMSYS 2019

slide-39
SLIDE 39

Slab Move Rate in Time-Varying Workload

Move Rate (GETs) Min Max default 19,454,304 17 26,333,587 fast 77.62 4 220

◮ 99.99% decrease in the average time required to reassign a

slab

25/29 Byrne, Onder, Wang; MEMSYS 2019

slide-40
SLIDE 40

Miss Ratio in Time-Varying Workload

0.2 0.3 0.4 0.5 1000 2000 3000 4000 GET Requests (Millions) Miss Ratio Slab Speed

default fast

◮ 3.42% improvement in overall miss ratio

26/29 Byrne, Onder, Wang; MEMSYS 2019

slide-41
SLIDE 41

Background memcached Faster Slab Reassignment Experimental Evaluation Conclusion

27/29 Byrne, Onder, Wang; MEMSYS 2019

slide-42
SLIDE 42

Conclusion

◮ Improved miss ratio in memcached as a result of faster slab

reassignment among classes

◮ Orthogonal to other works that decide how many slabs to

assign to each class

◮ Submitted our implementation as patch to current

memcached source code

28/29 Byrne, Onder, Wang; MEMSYS 2019

slide-43
SLIDE 43

Thank you, questions? Faster Slab Reassignment in memcached

Daniel Byrne

djbyrne@mtu.edu

Nilufer Onder

nilufer@mtu.edu

Zhenlin Wang

zlwang@mtu.edu

Department of Computer Science Michigan Technological University

MEMSYS 2019

29/29 Byrne, Onder, Wang; MEMSYS 2019