Faster Slab Reassignment in memcached Daniel Byrne Nilufer Onder - PowerPoint PPT Presentation

Faster Slab Reassignment in memcached Daniel Byrne Nilufer Onder Zhenlin Wang djbyrne@mtu.edu nilufer@mtu.edu zlwang@mtu.edu Department of Computer Science Michigan Technological University MEMSYS 2019 1/29 Byrne, Onder, Wang; MEMSYS 2019

Background memcached Faster Slab Reassignment Experimental Evaluation Conclusion 2/29 Byrne, Onder, Wang; MEMSYS 2019

Cache Data from Backend Systems web image msg 100 µs memcached 10,000 µs DB RecSys AdSrv 3/29 Byrne, Onder, Wang; MEMSYS 2019

Cache Miss Ratio Drives Performance Access Time (µs) System Miss Ratio Cache Backend End-to-End A 2% 100 10,000 298 B 1% 100 10,000 199 4/29 Byrne, Onder, Wang; MEMSYS 2019

Cache Miss Ratio Drives Performance Access Time (µs) System Miss Ratio Cache Backend End-to-End A 2% 100 10,000 298 B 1% 100 10,000 199 2 % → 1 % in miss ratio → 33% decrease in end-to-end latency! 4/29 Byrne, Onder, Wang; MEMSYS 2019

memcached ◮ memcached resides in server’s main memory ◮ Data is stored as: key,value pair ◮ To retrieve an item: GET key ◮ To store/update an item: SET key value ◮ Deployed in many large scale datacenters 6/29 Byrne, Onder, Wang; MEMSYS 2019

memcached Memory Organization Class 1 Class N LRU HEAD Slab 1 Slab 1 … Slab 2 Slab 2 ... ... LRU TAIL Slab N Slab N ◮ A class is a collection of slabs that contain items ◮ Each class corresponds to items of a given size range ◮ Each class maintains its own LRU queue 7/29 Byrne, Onder, Wang; MEMSYS 2019

Why Reassign Slabs Among Classes? ◮ Adapt to changes in an application’s workload ◮ Working set sizes can change over time ◮ An application may change its item size distribution ◮ Dynamically reassign slabs when new applications enter the cache ◮ Miss ratio curves can be used to find optimal allocation among classes ◮ LAMA, ATC ’15 ◮ mPart, ISMM ’18 ◮ Our work focuses on the process of reassigning slabs from one class to another 8/29 Byrne, Onder, Wang; MEMSYS 2019

Adapting to a New Class of Items ◮ Two-Phase Workload ◮ Fix the total memory size at 384MB phase % class 1 % class 2 reqs 1 100 0 13 million 2 33 67 87 million 9/29 Byrne, Onder, Wang; MEMSYS 2019

Default Slab Reassignment 400 Speed 300 default class1 200 100 Slabs 0 400 phase phase 300 1 2 class2 200 100 0 0 25 50 75 100 GET Requests (Millions) ◮ 65 million requests to reassign over 200 slabs 10/29 Byrne, Onder, Wang; MEMSYS 2019

Impact on Overall Miss Ratio Slab Speed default 0.4 Miss Ratio 0.3 0.2 0 25 50 75 100 GET Requests (Millions) ◮ 60 million requests to reach steady state miss ratio 11/29 Byrne, Onder, Wang; MEMSYS 2019

Slab reassignment Slab 1 - Class 1 thread Current Reassignment Algorithm ◮ Goal: Reassign a slab from class 1 to class 2 Class 1 Class 2 Slab 1 Slab 1 Slab 2 Slab 2 12/29 Byrne, Onder, Wang; MEMSYS 2019

Current Reassignment Algorithm ◮ Goal: Reassign a slab from class 1 to class 2 Class 1 Class 2 Slab 1 Slab 1 Slab 2 Slab 2 Slab reassignment Slab 1 - Class 1 thread 12/29 Byrne, Onder, Wang; MEMSYS 2019

Current Reassignment Algorithm 1. Acquire item lock 1. Acquire item lock Slab 1 - Class 1 13/29 Byrne, Onder, Wang; MEMSYS 2019

Current Reassignment Algorithm 1. Acquire item lock 2. Check that no other threads reference this item 2. Check how many other threads reference this item Slab 1 - Class 1 13/29 Byrne, Onder, Wang; MEMSYS 2019

Current Reassignment Algorithm 1. Acquire item lock 2. Check that no other threads reference this item 3. Unlink the item from the LRU queue 3. Unlink the item from the LRU queue Slab 1 - Class 1 13/29 Byrne, Onder, Wang; MEMSYS 2019

Current Reassignment Algorithm 1. Acquire item lock 2. Check that no other threads reference this item 3. Unlink the item from the LRU queue 4. Free the item 4. Free the item Slab 1 - Class 1 13/29 Byrne, Onder, Wang; MEMSYS 2019

Current Reassignment Algorithm 1. Acquire item lock 2. Check that no other threads reference this item 3. Unlink the item from the LRU queue 4. Free the item 5. Mark the item as was busy 5. Mark the item as was busy and wait ~1ms Slab 1 - Class 1 13/29 Byrne, Onder, Wang; MEMSYS 2019

Current Reassignment Algorithm 1. Acquire item lock 2. Check that no other threads reference this item 3. Unlink the item from the LRU queue 4. Free the item 5. Mark the item as was busy 6. Repeat for next item 6. Repeat for next item Slab 1 - Class 1 13/29 Byrne, Onder, Wang; MEMSYS 2019

Current Reassignment Algorithm 1. Acquire item lock 2. Check that no other threads reference this item 3. Unlink the item from the LRU queue 4. Free the item 5. Mark the item as was busy 6. Repeat for next item 7. Return to head 7. Return to head Slab 1 - Class 1 13/29 Byrne, Onder, Wang; MEMSYS 2019

Current Reassignment Algorithm 1. Acquire item lock 2. Check that no other threads reference this item 3. Unlink the item from the LRU queue 4. Free the item 5. Mark the item as was busy 6. Repeat for next item 7. Return to head 8. Remove item slot from class freelist 8. Remove item slot from class freelist Slab 1 - Class 1 13/29 Byrne, Onder, Wang; MEMSYS 2019

Current Reassignment Algorithm 1. Acquire item lock 2. Check that no other threads reference this item 3. Unlink the item from the LRU queue 4. Free the item 5. Mark the item as was busy 6. Repeat for next item 7. Return to head 8. Remove item slot from class freelist 9. Assign to class 2 9. Assign to class 2 Slab 3 - Class 2 13/29 Byrne, Onder, Wang; MEMSYS 2019

What Slows Down Reassignment? ◮ Each was busy causes the thread to sleep ◮ Slab reassign thread detected that the item was in use and cannot be cut from the class’s freelist at this moment ◮ During thread sleep, an item can be allocated to the item slot New item gets allocated to slot Slab 1 - Class 1 14/29 Byrne, Onder, Wang; MEMSYS 2019

What Slows Down Reassignment? ◮ Each was busy causes the thread to sleep ◮ Slab reassign thread detected that the item was in use and cannot be cut from the class’s freelist at this moment ◮ During thread sleep, an item can be allocated to the item slot Have to free and unlink again! Slab 1 - Class 1 14/29 Byrne, Onder, Wang; MEMSYS 2019

Sleeping for a Shorter Period algorithm sleep interval (µs) slabs/s default 1 36.93 default 10 35.5 default 100 26.55 default 1000 4.12 ◮ Moving hundreds of slabs still requires several seconds of waiting on the slab reassingment thread to complete 15/29 Byrne, Onder, Wang; MEMSYS 2019

Faster Slab Reassignment ◮ Remove the items immediately from the class’s freelist ◮ Removes was busy waiting on recently freed items ◮ Stops items from being allocated to recently freed slots 17/29 Byrne, Onder, Wang; MEMSYS 2019

Faster Slab Reassignment Algorithm 1. Acquire item lock 2. Check that no other threads reference this item 3. Unlink the item from the LRU queue 4. Free the item 4. Free the item Slab 1 - Class 1 18/29 Byrne, Onder, Wang; MEMSYS 2019

Faster Slab Reassignment Algorithm 1. Acquire item lock 2. Check that no other threads reference this item 3. Unlink the item from the LRU queue 4. Free the item 5. Remove item slot from class freelist 5. Remove item from freelist Slab 1 - Class 1 18/29 Byrne, Onder, Wang; MEMSYS 2019

Faster Slab Reassignment Algorithm 1. Acquire item lock 2. Check that no other threads reference this item 3. Unlink the item from the LRU queue 4. Free the item 5. Remove item slot from class freelist 6. Repeat for next item 6. Repeat for each item Slab 1 - Class 1 18/29 Byrne, Onder, Wang; MEMSYS 2019

Faster Slab Reassignment Algorithm 1. Acquire item lock 2. Check that no other threads reference this item 3. Unlink the item from the LRU queue 4. Free the item 5. Remove item slot from class freelist 6. Repeat for next item 7. Assign to class 2 7. Assign to class 2 Slab 3 - Class 2 18/29 Byrne, Onder, Wang; MEMSYS 2019

Experimental Setup ◮ Implemented fast slab reassignment algorithm in memcached ◮ Use miss ratio curve partitioning to assign memory among classes ◮ 2 different workloads - See paper for multi-tenant evaluation ◮ Two-Phase ◮ Time-Varying ◮ Record the overall miss ratio and slab assignments over entire trace 20/29 Byrne, Onder, Wang; MEMSYS 2019

Slab Movement in Two-Phase Workload 400 Speed 300 default fast class1 200 100 Slabs 0 400 phase phase 2 1 300 class2 200 100 0 0 25 50 75 100 GET Requests (Millions) ◮ Over 95% reduction in the time needed to reallocate slabs 21/29 Byrne, Onder, Wang; MEMSYS 2019

Miss Ratio in Two-Phase Workload Slab Speed default fast 0.4 Miss Ratio 0.3 0.2 0 25 50 75 100 GET Requests (Millions) ◮ Over 95% reduction time to steady state ◮ 11.5% improvement in the mean miss ratio 22/29 Byrne, Onder, Wang; MEMSYS 2019

Faster Slab Reassignment in memcached Daniel Byrne Nilufer Onder - PowerPoint PPT Presentation

Faster Slab Reassignment in memcached Daniel Byrne Nilufer Onder Zhenlin Wang djbyrne@mtu.edu nilufer@mtu.edu zlwang@mtu.edu Department of Computer Science Michigan Technological University MEMSYS 2019 1/29 Byrne, Onder, Wang; MEMSYS 2019

Hidden Scalability Gotchas Gotchas Hidden Scalability in Memcached Memcached and Friends and

Memcached Install, Overview & Benchmarks What is Memcached? Really, its Memcache-d (d as

BUILDING PAD FOOTINGS 1ST FLOOR COLUMNS SLAB ON GRADE 2ND FLOOR SLAB 2ND FLOOR COLUMNS ROOF

Slab allocators in the Linux Kernel: SLAB, SLOB, SLUB Christoph Lameter, LinuxCon/Dsseldorf

Slab allocators in the Linux Kernel: SLAB, SLOB, SLUB Christoph Lameter, LCA 2015 Auckland/New

Student Reassignment August 12, 2019 Dr. Jake Henry, OCS Thomas Dudley, OREd Why Student

A GRASP approach for the machine reassignment problem Michal Gabay Sofia Zaourar Laboratoire

FASTER TRANSFORMER Bo Yang Hsueh, 2019/12/18 AGENDA What is Faster Transformer Introduce the

BUILDING TECHNOLOGY IV CONCRETE SLAB AND HOLLOW CORE Presented By: Espalliat Rafael

CONTINUOUS SLAB BRIDGE CONTINUOUS SLAB BRIDGE COMPARITIVE STUDY COMPARITIVE STUDY LRFD vs.

THE ULTIMATE SOLUTION THE ULTIMATE SOLUTION GLOBAL APPLICABILITY GLOBAL APPLICABILITY

Lecture 2 Lecture 2 One One- -way Joist way Joist Slab System Slab System Dr. Hazim

1 <Insert Picture Here> The Native NDB Engine for Memcached John David Duncan

Transactionalizing Legacy Code: An Experience Report Using GCC and Memcached Trilok Vyas, Yujie

Safe, Fast Sharing of memcached as a Protected Library Chris Kjellqvist , Mohammad Hedayati,

PolyVoid Case Study Gunnedah NSW PolyVoid Slab Under Abnormal Moisture Conditions The

CS5412 / LECTURE 7 Ken Birman THE PUZZLE OF ALWAYS SHARDED Spring, 2020 IOT DATA AND

Define Once, Evaluate Anywhere Building Repeatable and Correct Features at Stripe Kelley Rivoire

SPaT Challenge Webinar Series Webinar #1: Initial SPaT Challenge Activities 2:00 3:30 PM

Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Raghav Sethi Michael Kaminsky David G.

Can I listen to that online? Jonathan Manton Music Librarian for Digital and Access Services

$1.7 trillion Market for IoT by 2020 IDC 25 billion Connected things by 2020 Gartner

A Modular OpenModelica Compiler Backend J. Frenkel W. Braun A. Pop M. Sjlund

Diving into Petascale Production File Systems through Large Scale Profiling and Analysis Feiyi

Faster Slab Reassignment in memcached Daniel Byrne Nilufer Onder - PowerPoint PPT Presentation

Faster Slab Reassignment in memcached Daniel Byrne Nilufer Onder Zhenlin Wang djbyrne@mtu.edu nilufer@mtu.edu zlwang@mtu.edu Department of Computer Science Michigan Technological University MEMSYS 2019 1/29 Byrne, Onder, Wang; MEMSYS 2019

Hidden Scalability Gotchas Gotchas Hidden Scalability in Memcached Memcached and Friends and

Memcached Install, Overview &amp; Benchmarks What is Memcached? Really, its Memcache-d (d as

BUILDING PAD FOOTINGS 1ST FLOOR COLUMNS SLAB ON GRADE 2ND FLOOR SLAB 2ND FLOOR COLUMNS ROOF

Slab allocators in the Linux Kernel: SLAB, SLOB, SLUB Christoph Lameter, LinuxCon/Dsseldorf

Slab allocators in the Linux Kernel: SLAB, SLOB, SLUB Christoph Lameter, LCA 2015 Auckland/New

Student Reassignment August 12, 2019 Dr. Jake Henry, OCS Thomas Dudley, OREd Why Student

A GRASP approach for the machine reassignment problem Michal Gabay Sofia Zaourar Laboratoire

FASTER TRANSFORMER Bo Yang Hsueh, 2019/12/18 AGENDA What is Faster Transformer Introduce the

BUILDING TECHNOLOGY IV CONCRETE SLAB AND HOLLOW CORE Presented By: Espalliat Rafael

CONTINUOUS SLAB BRIDGE CONTINUOUS SLAB BRIDGE COMPARITIVE STUDY COMPARITIVE STUDY LRFD vs.

THE ULTIMATE SOLUTION THE ULTIMATE SOLUTION GLOBAL APPLICABILITY GLOBAL APPLICABILITY

Lecture 2 Lecture 2 One One- -way Joist way Joist Slab System Slab System Dr. Hazim

1 &lt;Insert Picture Here&gt; The Native NDB Engine for Memcached John David Duncan

Transactionalizing Legacy Code: An Experience Report Using GCC and Memcached Trilok Vyas, Yujie

Safe, Fast Sharing of memcached as a Protected Library Chris Kjellqvist , Mohammad Hedayati,

PolyVoid Case Study Gunnedah NSW PolyVoid Slab Under Abnormal Moisture Conditions The

CS5412 / LECTURE 7 Ken Birman THE PUZZLE OF ALWAYS SHARDED Spring, 2020 IOT DATA AND

Define Once, Evaluate Anywhere Building Repeatable and Correct Features at Stripe Kelley Rivoire

SPaT Challenge Webinar Series Webinar #1: Initial SPaT Challenge Activities 2:00 3:30 PM

Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Raghav Sethi Michael Kaminsky David G.

Can I listen to that online? Jonathan Manton Music Librarian for Digital and Access Services

$1.7 trillion Market for IoT by 2020 IDC 25 billion Connected things by 2020 Gartner

A Modular OpenModelica Compiler Backend J. Frenkel W. Braun A. Pop M. Sjlund

Diving into Petascale Production File Systems through Large Scale Profiling and Analysis Feiyi

Memcached Install, Overview & Benchmarks What is Memcached? Really, its Memcache-d (d as

1 <Insert Picture Here> The Native NDB Engine for Memcached John David Duncan