Scaling Guest OS Critical Sections with e CS Sanidhya Kashyap, Changwoo Min, Taesoo Kim
The physical and virtual CPU abstraction Mismatch between ● CPU abstraction 2
The physical and virtual CPU abstraction Mismatch between ● CPU abstraction pCPU 1 pCPU 2 pCPU 3 pCPU 4 Physical machine (Host) 3
The physical and virtual CPU abstraction Mismatch between ● CPU abstraction Hardware pCPU 1 pCPU 2 pCPU 3 pCPU 4 abstraction Physical machine (Host) 4
The physical and virtual CPU abstraction ... App App App Mismatch between ● Virtual machine vCPU 1 vCPU 2 vCPU 3 vCPU 4 CPU abstraction Hypervisor Hardware pCPU 1 pCPU 2 pCPU 3 pCPU 4 abstraction Physical machine (Host) 5
The physical and virtual CPU abstraction ... App App App Mismatch between ● Virtual machine Software vCPU 1 vCPU 2 vCPU 3 vCPU 4 CPU abstraction abstraction Hypervisor Hardware pCPU 1 pCPU 2 pCPU 3 pCPU 4 abstraction Physical machine (Host) 6
The physical and virtual CPU abstraction ... App App App Mismatch between ● Virtual machine Software vCPU 1 vCPU 2 vCPU 3 vCPU 4 CPU abstraction abstraction Hypervisor Hardware pCPU 1 pCPU 2 pCPU 3 pCPU 4 abstraction Physical machine (Host) VM consolidation ● Apps Apps Apps Apps - Contention on pCPU VM1 VM2 VM3 VM4 Multiple vCPUs Hypervisor pCPU 1 pCPU 2 pCPU 3 pCPU 4 Physical machine (Host) 7
The physical and virtual CPU abstraction ... App App App Mismatch between ● Virtual machine Software vCPU 1 vCPU 2 vCPU 3 vCPU 4 A vCPU can be preempted without notification CPU abstraction abstraction Hypervisor Hardware pCPU 1 pCPU 2 pCPU 3 pCPU 4 abstraction Physical machine (Host) VM consolidation ● Apps Apps Apps Apps - Contention on pCPU VM1 VM2 VM3 VM4 Multiple vCPUs Hypervisor pCPU 1 pCPU 2 pCPU 3 pCPU 4 Physical machine (Host) 8
The physical and virtual CPU abstraction ... App App App Mismatch between ● Virtual machine Software vCPU 1 vCPU 2 vCPU 3 vCPU 4 A vCPU can be preempted without notification CPU abstraction abstraction Hypervisor Hardware pCPU 1 pCPU 2 pCPU 3 pCPU 4 abstraction Physical machine (Host) VM consolidation ● Apps Apps Apps Apps - Contention on vCPU VM1 VM2 VM3 VM4 Multiple vCPUs Double scheduling issue Hypervisor pCPU 1 pCPU 2 pCPU 3 pCPU 4 Physical machine (Host) 9
Double scheduling: Lock holder preemption (LHP) A B C vCPU 1 vCPU 1 vCPU 2 vCPU 3 File Access a file vCPU vCPU Running task scheduled preempted in a VM vCPU holding a lock is preempted ● Preemption hinders forward progress of the VM ● Can lead to application slowdown by 20 -- 130% ● 10
Efforts to mitigate preemption issues Research efforts Current practice Focussed only non-blocking locks Mostly address other preemption ● ● Acquire iff sufficient schedule time problem ○ Hotplug vCPUs on the fly ● Blocking locks ○ May not scale to large vCPU VMs ○ Unfair non-blocking locks ○ VM co-scheduling ● Hardware features to mitigate ● Does not always alleviate the issue ○ preemptions 11
Efforts to mitigate preemption issues Research efforts Current practice Focussed only non-blocking locks Mostly address other preemption ● ● Acquire iff sufficient schedule time problem ○ Hotplug vCPUs on the fly ● Blocking locks ○ May not scale to large vCPU VMs ○ Unfair non-blocking locks ○ VM co-scheduling ● Hardware features to mitigate Prior approaches are mostly specialized ● Does not always alleviate the issue ○ preemptions 12
Still the double scheduling is looming! LHP for blocking locks ● mutex, rwsem ○ Readers preemption (RP) in read-write locks ● A reader is preempted while holding the lock ○ Interrupt context preemption (ICP) ● Preemption of a vCPU processing an interrupt ○ Blocked-waiter wakeup (BWW) ● Waking up a blocked thread on an idle vCPU is at least 10 times costlier ○ 13
Still the double scheduling is looming! LHP for blocking locks ● mutex, rwsem ○ Readers preemption (RP) in read-write locks ● A reader is preempted while holding the lock ○ Semantic gap between virtual and physical CPU Interrupt context preemption (ICP) ● Preemption of a vCPU processing an interrupt ○ Blocked-waiter wakeup (BWW) ● Waking up a blocked thread on an idle vCPU is at least 10 times costlier ○ 14
Our approach to address semantic gap Insight: A vCPU may be running a critical task! Approach: Avoid preempting a vCPU with a critical task Design: Identify and mark/unmark a critical task 15
Identifying each critical section with e CS A B C vCPU 1 vCPU 1 vCPU 2 vCPU 2 vCPU 3 File Access a file Scheduled Preempted Running task vCPU vCPU in a VM Synchronization primitives protect critical sections → ensure OS progress ● Mark and unmark critical sections before and after the critical section ● Conservative, but effective approach to address each preemption problem ● 60 LoC annotates 85K lock invocations in 13M LoC in Linux ○ 16
Identifying each critical section with e CS A B C vCPU 1 vCPU 1 vCPU 2 vCPU 2 vCPU 3 File Access a file Scheduled Preempted Running task Enlightened vCPU vCPU in a VM vCPU Synchronization primitives protect critical sections → ensure OS progress ● Mark and unmark critical sections before and after the critical section ● Conservative, but effective approach to address each preemption problem ● 60 LoC annotates 85K lock invocations in 13M LoC in Linux ○ 17
Identifying each critical section with e CS A B C vCPU 1 vCPU 2 vCPU 2 vCPU 3 File Access a file Scheduled Preempted Running task Enlightened vCPU vCPU in a VM vCPU Synchronization primitives protect critical sections → ensure OS progress ● Mark and unmark critical sections before and after the critical section ● Conservative, but effective approach to address each preemption problem ● 60 LoC annotates 85K lock invocations in 13M LoC in Linux ○ 18
Sharing the state for efficient notification Each vCPU shares memory with the hypervisor ● ... vCPU(A) vCPU(B) vCPU(C) eCS eCS eCS vCPU updates information for critical sections ● states states states VM Notifies critical task to the hypervisor ○ non_preemptable_ecs_count Hypervisor also updates scheduler context ● preemptable_ecs_count pcpu_overloaded (0/1) before/after scheduling out a vCPU vcpu_preempted (0/1) Enables vCPU to make efficient scheduling ○ ... eCS eCS eCS decisions states states states vCPU(A) vCPU(B) vCPU(C) state state state Hypervisor 19
Lightweight para-virtualized APIs to update states Hint API ... vCPU(A) vCPU(B) vCPU(C) activate_non_preemptable_ecs(cpu) eCS eCS eCS states states states VM deactivate_non_preemptable_ecs(cpu_id) VM → Hypervisor non_preemptable_ecs_count activate_preemptable_ecs(cpu_id)) preemptable_ecs_count pcpu_overloaded (0/1) vcpu_preempted (0/1) deactivate_preemptable_ecs(cpu_id) is_vcpu_preempted(cpu_id) Hypervisor → VM ... eCS eCS eCS states states states is_pcpu_overloaded(cpu_id) vCPU(A) vCPU(B) vCPU(C) state state state Updated by each vCPU; read by the hypervisor Hypervisor Update by the hypervisor; read by a vCPU 20
Hypervisor checks eCS state before scheduling out a vCPU ➁ A B C ... vCPU(A) vCPU(B) vCPU(C) VM1 eCS eCS eCS vCPU 1 vCPU 1 vCPU 2 vCPU 3 states states states VM1 ➂ File ecs_count (0) ecs_count (1) ecs_count (0) ➃ Access a file Scheduled Preempted Running task ➅ Enlightened ... vCPU vCPU in a VM vCPU ➀ ➄ ➆ vCPU 1 VM1 ➀ Running vCPU 1 ➁ vCPU 1 acquires lock ➂ vCPU 1 updates eCS count vCPU 1 VM2 ➃ Hypervisor checks states before vCPU 1 preemption pCPU 1 ➄ Hypervisor lets vCPU 1 runs for extra time Time shared ➅ vCPU 1 finishes and updates eCS count ➆ Hypervisor penalizes vCPU 1 later 21
Hypervisor checks eCS state before scheduling out a vCPU ➁ A B C ... vCPU(A) vCPU(B) vCPU(C) VM1 eCS eCS eCS vCPU 1 vCPU 1 vCPU 2 vCPU 3 states states states VM1 ➂ File ecs_count (0) ecs_count (1) ecs_count (0) ➃ Access a file Scheduled Preempted Running task ➅ Enlightened Penalized schedule Extended schedule ... vCPU vCPU in a VM vCPU ➀ ➄ ➆ vCPU 1 VM1 ➀ Running vCPU 1 ➁ vCPU 1 acquires lock ➂ vCPU 1 updates eCS count vCPU 1 VM2 ➃ Hypervisor checks states before vCPU 1 preemption pCPU 1 ➄ Hypervisor lets vCPU 1 runs for extra time Time shared ➅ vCPU 1 finishes and updates eCS count ➆ Hypervisor penalizes vCPU 1 later 22
The case for system eventual fairness Hypervisor accounts extra time and later penalizes the enlightened VM ● Penalize the schedule of an enlightened VM ○ Extend the schedule of the very next VM ○ Hypervisor optimistically extends time for an enlightened CS ● Decision made just before scheduling out a vCPU ○ Extra time (schedule) to avoid preemption: 1 ms ○ 23
Recommend
More recommend