Scaling Guest OS Critical Sections with e CS Sanidhya Kashyap, - PowerPoint PPT Presentation

Scaling Guest OS Critical Sections with e CS Sanidhya Kashyap, Changwoo Min, Taesoo Kim

The physical and virtual CPU abstraction Mismatch between ● CPU abstraction 2

The physical and virtual CPU abstraction Mismatch between ● CPU abstraction pCPU 1 pCPU 2 pCPU 3 pCPU 4 Physical machine (Host) 3

The physical and virtual CPU abstraction Mismatch between ● CPU abstraction Hardware pCPU 1 pCPU 2 pCPU 3 pCPU 4 abstraction Physical machine (Host) 4

The physical and virtual CPU abstraction ... App App App Mismatch between ● Virtual machine vCPU 1 vCPU 2 vCPU 3 vCPU 4 CPU abstraction Hypervisor Hardware pCPU 1 pCPU 2 pCPU 3 pCPU 4 abstraction Physical machine (Host) 5

The physical and virtual CPU abstraction ... App App App Mismatch between ● Virtual machine Software vCPU 1 vCPU 2 vCPU 3 vCPU 4 CPU abstraction abstraction Hypervisor Hardware pCPU 1 pCPU 2 pCPU 3 pCPU 4 abstraction Physical machine (Host) 6

The physical and virtual CPU abstraction ... App App App Mismatch between ● Virtual machine Software vCPU 1 vCPU 2 vCPU 3 vCPU 4 CPU abstraction abstraction Hypervisor Hardware pCPU 1 pCPU 2 pCPU 3 pCPU 4 abstraction Physical machine (Host) VM consolidation ● Apps Apps Apps Apps - Contention on pCPU VM1 VM2 VM3 VM4 Multiple vCPUs Hypervisor pCPU 1 pCPU 2 pCPU 3 pCPU 4 Physical machine (Host) 7

The physical and virtual CPU abstraction ... App App App Mismatch between ● Virtual machine Software vCPU 1 vCPU 2 vCPU 3 vCPU 4 A vCPU can be preempted without notification CPU abstraction abstraction Hypervisor Hardware pCPU 1 pCPU 2 pCPU 3 pCPU 4 abstraction Physical machine (Host) VM consolidation ● Apps Apps Apps Apps - Contention on pCPU VM1 VM2 VM3 VM4 Multiple vCPUs Hypervisor pCPU 1 pCPU 2 pCPU 3 pCPU 4 Physical machine (Host) 8

The physical and virtual CPU abstraction ... App App App Mismatch between ● Virtual machine Software vCPU 1 vCPU 2 vCPU 3 vCPU 4 A vCPU can be preempted without notification CPU abstraction abstraction Hypervisor Hardware pCPU 1 pCPU 2 pCPU 3 pCPU 4 abstraction Physical machine (Host) VM consolidation ● Apps Apps Apps Apps - Contention on vCPU VM1 VM2 VM3 VM4 Multiple vCPUs Double scheduling issue Hypervisor pCPU 1 pCPU 2 pCPU 3 pCPU 4 Physical machine (Host) 9

Double scheduling: Lock holder preemption (LHP) A B C vCPU 1 vCPU 1 vCPU 2 vCPU 3 File Access a file vCPU vCPU Running task scheduled preempted in a VM vCPU holding a lock is preempted ● Preemption hinders forward progress of the VM ● Can lead to application slowdown by 20 -- 130% ● 10

Efforts to mitigate preemption issues Research efforts Current practice Focussed only non-blocking locks Mostly address other preemption ● ● Acquire iff sufficient schedule time problem ○ Hotplug vCPUs on the fly ● Blocking locks ○ May not scale to large vCPU VMs ○ Unfair non-blocking locks ○ VM co-scheduling ● Hardware features to mitigate ● Does not always alleviate the issue ○ preemptions 11

Efforts to mitigate preemption issues Research efforts Current practice Focussed only non-blocking locks Mostly address other preemption ● ● Acquire iff sufficient schedule time problem ○ Hotplug vCPUs on the fly ● Blocking locks ○ May not scale to large vCPU VMs ○ Unfair non-blocking locks ○ VM co-scheduling ● Hardware features to mitigate Prior approaches are mostly specialized ● Does not always alleviate the issue ○ preemptions 12

Still the double scheduling is looming! LHP for blocking locks ● mutex, rwsem ○ Readers preemption (RP) in read-write locks ● A reader is preempted while holding the lock ○ Interrupt context preemption (ICP) ● Preemption of a vCPU processing an interrupt ○ Blocked-waiter wakeup (BWW) ● Waking up a blocked thread on an idle vCPU is at least 10 times costlier ○ 13

Still the double scheduling is looming! LHP for blocking locks ● mutex, rwsem ○ Readers preemption (RP) in read-write locks ● A reader is preempted while holding the lock ○ Semantic gap between virtual and physical CPU Interrupt context preemption (ICP) ● Preemption of a vCPU processing an interrupt ○ Blocked-waiter wakeup (BWW) ● Waking up a blocked thread on an idle vCPU is at least 10 times costlier ○ 14

Our approach to address semantic gap Insight: A vCPU may be running a critical task! Approach: Avoid preempting a vCPU with a critical task Design: Identify and mark/unmark a critical task 15

Identifying each critical section with e CS A B C vCPU 1 vCPU 1 vCPU 2 vCPU 2 vCPU 3 File Access a file Scheduled Preempted Running task vCPU vCPU in a VM Synchronization primitives protect critical sections → ensure OS progress ● Mark and unmark critical sections before and after the critical section ● Conservative, but effective approach to address each preemption problem ● 60 LoC annotates 85K lock invocations in 13M LoC in Linux ○ 16

Identifying each critical section with e CS A B C vCPU 1 vCPU 1 vCPU 2 vCPU 2 vCPU 3 File Access a file Scheduled Preempted Running task Enlightened vCPU vCPU in a VM vCPU Synchronization primitives protect critical sections → ensure OS progress ● Mark and unmark critical sections before and after the critical section ● Conservative, but effective approach to address each preemption problem ● 60 LoC annotates 85K lock invocations in 13M LoC in Linux ○ 17

Identifying each critical section with e CS A B C vCPU 1 vCPU 2 vCPU 2 vCPU 3 File Access a file Scheduled Preempted Running task Enlightened vCPU vCPU in a VM vCPU Synchronization primitives protect critical sections → ensure OS progress ● Mark and unmark critical sections before and after the critical section ● Conservative, but effective approach to address each preemption problem ● 60 LoC annotates 85K lock invocations in 13M LoC in Linux ○ 18

Sharing the state for efficient notification Each vCPU shares memory with the hypervisor ● ... vCPU(A) vCPU(B) vCPU(C) eCS eCS eCS vCPU updates information for critical sections ● states states states VM Notifies critical task to the hypervisor ○ non_preemptable_ecs_count Hypervisor also updates scheduler context ● preemptable_ecs_count pcpu_overloaded (0/1) before/after scheduling out a vCPU vcpu_preempted (0/1) Enables vCPU to make efficient scheduling ○ ... eCS eCS eCS decisions states states states vCPU(A) vCPU(B) vCPU(C) state state state Hypervisor 19

Lightweight para-virtualized APIs to update states Hint API ... vCPU(A) vCPU(B) vCPU(C) activate_non_preemptable_ecs(cpu) eCS eCS eCS states states states VM deactivate_non_preemptable_ecs(cpu_id) VM → Hypervisor non_preemptable_ecs_count activate_preemptable_ecs(cpu_id)) preemptable_ecs_count pcpu_overloaded (0/1) vcpu_preempted (0/1) deactivate_preemptable_ecs(cpu_id) is_vcpu_preempted(cpu_id) Hypervisor → VM ... eCS eCS eCS states states states is_pcpu_overloaded(cpu_id) vCPU(A) vCPU(B) vCPU(C) state state state Updated by each vCPU; read by the hypervisor Hypervisor Update by the hypervisor; read by a vCPU 20

Hypervisor checks eCS state before scheduling out a vCPU ➁ A B C ... vCPU(A) vCPU(B) vCPU(C) VM1 eCS eCS eCS vCPU 1 vCPU 1 vCPU 2 vCPU 3 states states states VM1 ➂ File ecs_count (0) ecs_count (1) ecs_count (0) ➃ Access a file Scheduled Preempted Running task ➅ Enlightened ... vCPU vCPU in a VM vCPU ➀ ➄ ➆ vCPU 1 VM1 ➀ Running vCPU 1 ➁ vCPU 1 acquires lock ➂ vCPU 1 updates eCS count vCPU 1 VM2 ➃ Hypervisor checks states before vCPU 1 preemption pCPU 1 ➄ Hypervisor lets vCPU 1 runs for extra time Time shared ➅ vCPU 1 finishes and updates eCS count ➆ Hypervisor penalizes vCPU 1 later 21

Hypervisor checks eCS state before scheduling out a vCPU ➁ A B C ... vCPU(A) vCPU(B) vCPU(C) VM1 eCS eCS eCS vCPU 1 vCPU 1 vCPU 2 vCPU 3 states states states VM1 ➂ File ecs_count (0) ecs_count (1) ecs_count (0) ➃ Access a file Scheduled Preempted Running task ➅ Enlightened Penalized schedule Extended schedule ... vCPU vCPU in a VM vCPU ➀ ➄ ➆ vCPU 1 VM1 ➀ Running vCPU 1 ➁ vCPU 1 acquires lock ➂ vCPU 1 updates eCS count vCPU 1 VM2 ➃ Hypervisor checks states before vCPU 1 preemption pCPU 1 ➄ Hypervisor lets vCPU 1 runs for extra time Time shared ➅ vCPU 1 finishes and updates eCS count ➆ Hypervisor penalizes vCPU 1 later 22

The case for system eventual fairness Hypervisor accounts extra time and later penalizes the enlightened VM ● Penalize the schedule of an enlightened VM ○ Extend the schedule of the very next VM ○ Hypervisor optimistically extends time for an enlightened CS ● Decision made just before scheduling out a vCPU ○ Extra time (schedule) to avoid preemption: 1 ms ○ 23

Scaling Guest OS Critical Sections with e CS Sanidhya Kashyap, - PowerPoint PPT Presentation

Scaling Guest OS Critical Sections with e CS Sanidhya Kashyap, Changwoo Min, Taesoo Kim The physical and virtual CPU abstraction Mismatch between CPU abstraction 2 The physical and virtual CPU abstraction Mismatch between CPU

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms

Jitendra Shah Feb 2012 Todays Class Sections of prisms Sections of pyramids

IPC, Threads, Races, Critical Sections 7A. Threads 7B. Inter-Process Communication Operating

Linear Temporal Logic, Critical Sections and Promela Modelling Dr. Liam OConnor University of

Linear Temporal Logic, Critical Sections and Promela Modelling Dr. Liam OConnor University of

Synchronization: Critical Sections & Semaphores Why? Examples What? The Critical

Effectively Scaling Effectively Scaling up/universalizing exclusive up/universalizing exclusive

Scaling From simple models to rich strategies PPPLab Day, November 30th Scaling: recent

Outline Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large Principles of

Sections Congress 2017 Report to Region Committee 01 April 2017 1 What is Sections Congress? 2

Sections Congress 2017 Report Region Committee 31 March 2017 1 What is Sections Congress? 2

WH In-House: In Brief Tuesday 20 th February 2018 Wi-Fi: Ward Hadaway Guest Email:

HOTEL SOLUTION OPULENCE | LUXURY | UTILITY | COMFORT www.messungsmart.com Guest Comfort In

Tech Nation Visa Scheme Wednesday 28 th March 2018 Ward Hadaway Guest WiFi Email:

Provable Multicore Schedulers with Ipanema: Application to Work-Conservation Baptiste Lepers

FLSCHED: A Lockless and Lightweight Approach to OS Scheduler for Xeon Phi Heeseung Jo Chonbuk

MAPPING PEERING INTERCONNECTIONS TO A FACILITY Vasileios Giotsas 1 Georgios Smaragdakis 2 Bradley

TxFS: Leveraging File-System Crash Consistency to Provide ACID Transactions Yige Hu, Zhiting

CS540 Uninformed Search Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department

Indian Valley Restoration and Water Luke Hunt Director of Headwaters Conservation American

The 1.8-ft gage datum discrepancy The USGS stated in its November 5, 2008 letter, the gage

Financial Statements and Valuation (Welch, Chapter 14) Ivo Welch Sample Project I Create an IRS

Scaling Guest OS Critical Sections with e CS Sanidhya Kashyap, - PowerPoint PPT Presentation

Scaling Guest OS Critical Sections with e CS Sanidhya Kashyap, Changwoo Min, Taesoo Kim The physical and virtual CPU abstraction Mismatch between CPU abstraction 2 The physical and virtual CPU abstraction Mismatch between CPU

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

Analysis of Scaling Algorithms for Matrix &amp; Operator Scaling Contents Scaling Algorithms

Jitendra Shah Feb 2012 Todays Class Sections of prisms Sections of pyramids

IPC, Threads, Races, Critical Sections 7A. Threads 7B. Inter-Process Communication Operating

Linear Temporal Logic, Critical Sections and Promela Modelling Dr. Liam OConnor University of

Linear Temporal Logic, Critical Sections and Promela Modelling Dr. Liam OConnor University of

Synchronization: Critical Sections &amp; Semaphores Why? Examples What? The Critical

Effectively Scaling Effectively Scaling up/universalizing exclusive up/universalizing exclusive

Scaling From simple models to rich strategies PPPLab Day, November 30th Scaling: recent

Outline Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large Principles of

Sections Congress 2017 Report to Region Committee 01 April 2017 1 What is Sections Congress? 2

Sections Congress 2017 Report Region Committee 31 March 2017 1 What is Sections Congress? 2

WH In-House: In Brief Tuesday 20 th February 2018 Wi-Fi: Ward Hadaway Guest Email:

HOTEL SOLUTION OPULENCE | LUXURY | UTILITY | COMFORT www.messungsmart.com Guest Comfort In

Tech Nation Visa Scheme Wednesday 28 th March 2018 Ward Hadaway Guest WiFi Email:

Provable Multicore Schedulers with Ipanema: Application to Work-Conservation Baptiste Lepers

FLSCHED: A Lockless and Lightweight Approach to OS Scheduler for Xeon Phi Heeseung Jo Chonbuk

MAPPING PEERING INTERCONNECTIONS TO A FACILITY Vasileios Giotsas 1 Georgios Smaragdakis 2 Bradley

TxFS: Leveraging File-System Crash Consistency to Provide ACID Transactions Yige Hu, Zhiting

CS540 Uninformed Search Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department

Indian Valley Restoration and Water Luke Hunt Director of Headwaters Conservation American

The 1.8-ft gage datum discrepancy The USGS stated in its November 5, 2008 letter, the gage

Financial Statements and Valuation (Welch, Chapter 14) Ivo Welch Sample Project I Create an IRS

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms

Synchronization: Critical Sections & Semaphores Why? Examples What? The Critical