Scaling Guest OS Critical Sections with eCS
Sanidhya Kashyap, Changwoo Min, Taesoo Kim
Scaling Guest OS Critical Sections with e CS Sanidhya Kashyap, - - PowerPoint PPT Presentation
Scaling Guest OS Critical Sections with e CS Sanidhya Kashyap, Changwoo Min, Taesoo Kim The physical and virtual CPU abstraction Mismatch between CPU abstraction 2 The physical and virtual CPU abstraction Mismatch between CPU
Sanidhya Kashyap, Changwoo Min, Taesoo Kim
The physical and virtual CPU abstraction
CPU abstraction
2
The physical and virtual CPU abstraction
3
Physical machine (Host)
pCPU 1 pCPU 2 pCPU 3 pCPU 4
CPU abstraction
The physical and virtual CPU abstraction
4
Hardware abstraction Physical machine (Host)
pCPU 1 pCPU 2 pCPU 3 pCPU 4
CPU abstraction
The physical and virtual CPU abstraction
5
Physical machine (Host)
pCPU 1
Hypervisor
pCPU 2 pCPU 3 pCPU 4
Virtual machine
vCPU 1 vCPU 2 vCPU 3 vCPU 4 App App App
Hardware abstraction
CPU abstraction
The physical and virtual CPU abstraction
6
Hardware abstraction Software abstraction Physical machine (Host)
pCPU 1
Hypervisor
pCPU 2 pCPU 3 pCPU 4
Virtual machine
vCPU 1 vCPU 2 vCPU 3 vCPU 4 App App App
CPU abstraction
The physical and virtual CPU abstraction
CPU abstraction
7
Hardware abstraction Software abstraction Physical machine (Host)
pCPU 1
Hypervisor
pCPU 2 pCPU 3 pCPU 4
Virtual machine
vCPU 1 vCPU 2 vCPU 3 vCPU 4 App App App
Multiple vCPUs Physical machine (Host)
pCPU 1
Hypervisor
pCPU 2 pCPU 3 pCPU 4
VM2
Apps
VM3
Apps
VM4
Apps
VM1
Apps
The physical and virtual CPU abstraction
CPU abstraction
8
Hardware abstraction Software abstraction Physical machine (Host)
pCPU 1
Hypervisor
pCPU 2 pCPU 3 pCPU 4
Virtual machine
vCPU 1 vCPU 2 vCPU 3 vCPU 4 App App App
Multiple vCPUs Physical machine (Host)
pCPU 1
Hypervisor
pCPU 2 pCPU 3 pCPU 4
VM2
Apps
VM3
Apps
VM4
Apps
VM1
Apps
The physical and virtual CPU abstraction
CPU abstraction
9
Hardware abstraction Software abstraction Physical machine (Host)
pCPU 1
Hypervisor
pCPU 2 pCPU 3 pCPU 4
Virtual machine
vCPU 1 vCPU 2 vCPU 3 vCPU 4 App App App
Multiple vCPUs Physical machine (Host)
pCPU 1
Hypervisor
pCPU 2 pCPU 3 pCPU 4
VM2
Apps
VM3
Apps
VM4
Apps
VM1
Apps
vCPU 1 vCPU 3 vCPU 2 vCPU 1
Double scheduling: Lock holder preemption (LHP)
10
vCPU scheduled vCPU preempted A B C
File
Access a file Running task in a VM
Efforts to mitigate preemption issues
11
○ Acquire iff sufficient schedule time
○ May not scale to large vCPU VMs
○ Does not always alleviate the issue
problem
○ Blocking locks ○ Unfair non-blocking locks
preemptions
Research efforts Current practice
Efforts to mitigate preemption issues
12
○ Acquire iff sufficient schedule time
○ May not scale to large vCPU VMs
○ Does not always alleviate the issue
problem
○ Blocking locks ○ Unfair non-blocking locks
preemptions
Research efforts Current practice
Still the double scheduling is looming!
○ mutex, rwsem
○ A reader is preempted while holding the lock
○ Preemption of a vCPU processing an interrupt
13
○ Waking up a blocked thread on an idle vCPU is at least 10 times costlier
Still the double scheduling is looming!
○ mutex, rwsem
○ A reader is preempted while holding the lock
○ Preemption of a vCPU processing an interrupt
14
○ Waking up a blocked thread on an idle vCPU is at least 10 times costlier
Our approach to address semantic gap
15
vCPU 1 vCPU 1 vCPU 2 vCPU 2 vCPU 3
Identifying each critical section with eCS
16
Scheduled vCPU Preempted vCPU A B C
File
Access a file
○ 60 LoC annotates 85K lock invocations in 13M LoC in Linux
Running task in a VM
vCPU 1 vCPU 1 vCPU 2 vCPU 2 vCPU 3
Identifying each critical section with eCS
17
Scheduled vCPU Preempted vCPU A B C
File
Access a file Enlightened vCPU
○ 60 LoC annotates 85K lock invocations in 13M LoC in Linux
Running task in a VM
vCPU 1 vCPU 2 vCPU 2 vCPU 3
Identifying each critical section with eCS
18
Scheduled vCPU Preempted vCPU A B C
File
Access a file Enlightened vCPU
○ 60 LoC annotates 85K lock invocations in 13M LoC in Linux
Running task in a VM
Sharing the state for efficient notification
19
vCPU(A) vCPU(B) vCPU(C) eCS states eCS states eCS states
VM
...
pcpu_overloaded (0/1) vcpu_preempted (0/1) non_preemptable_ecs_count preemptable_ecs_count vCPU(A) state eCS states eCS states eCS states vCPU(B) state vCPU(C) state
Hypervisor
...
○ Notifies critical task to the hypervisor
before/after scheduling out a vCPU ○ Enables vCPU to make efficient scheduling decisions
Lightweight para-virtualized APIs to update states
20
vCPU(A) vCPU(B) vCPU(C) eCS states eCS states eCS states
VM
...
pcpu_overloaded (0/1) vcpu_preempted (0/1)
Hint API VM → Hypervisor activate_non_preemptable_ecs(cpu) deactivate_non_preemptable_ecs(cpu_id) activate_preemptable_ecs(cpu_id)) deactivate_preemptable_ecs(cpu_id) Hypervisor → VM is_vcpu_preempted(cpu_id) is_pcpu_overloaded(cpu_id)
non_preemptable_ecs_count preemptable_ecs_count vCPU(A) state eCS states eCS states eCS states vCPU(B) state vCPU(C) state
Hypervisor
...
Updated by each vCPU; read by the hypervisor Update by the hypervisor; read by a vCPU
vCPU 1 vCPU 3 vCPU 2 vCPU 1
Hypervisor checks eCS state before scheduling out a vCPU
21
A B C
File
Access a file
vCPU(A) vCPU(B) vCPU(C) eCS states eCS states eCS states ecs_count (0)
VM1
...
ecs_count (1) ecs_count (0)
...
Time shared pCPU 1 vCPU 1VM2 vCPU 1VM1 Scheduled vCPU Preempted vCPU Enlightened vCPU Running task in a VM
➀ Running vCPU 1 ➁ vCPU 1 acquires lock ➂ vCPU 1 updates eCS count ➃ Hypervisor checks states before vCPU 1 preemption ➄ Hypervisor lets vCPU 1 runs for extra time ➅ vCPU 1 finishes and updates eCS count ➆ Hypervisor penalizes vCPU 1 later
VM1
➀ ➁ ➂ ➃ ➄ ➅ ➆
vCPU 1 vCPU 3 vCPU 2 vCPU 1
Hypervisor checks eCS state before scheduling out a vCPU
22
A B C
File
Access a file
vCPU(A) vCPU(B) vCPU(C) eCS states eCS states eCS states ecs_count (0)
VM1
...
ecs_count (1) ecs_count (0)
...
Time shared pCPU 1 vCPU 1VM2 vCPU 1VM1 Scheduled vCPU Preempted vCPU Enlightened vCPU Running task in a VM
➀ Running vCPU 1 ➁ vCPU 1 acquires lock ➂ vCPU 1 updates eCS count ➃ Hypervisor checks states before vCPU 1 preemption ➄ Hypervisor lets vCPU 1 runs for extra time ➅ vCPU 1 finishes and updates eCS count ➆ Hypervisor penalizes vCPU 1 later
VM1
➀ ➁ ➂ ➃ ➄ ➅ ➆
Extended schedule Penalized schedule
The case for system eventual fairness
○ Penalize the schedule of an enlightened VM ○ Extend the schedule of the very next VM
○ Decision made just before scheduling out a vCPU ○ Extra time (schedule) to avoid preemption: 1 ms
23
Even vCPU can make efficient scheduling decisions
○ Lock waiters can avoid bWW problem
○ Lock waiter keeps spinning until the lock is not acquired if the pCPU is not overloaded
24
vCPU(A) vCPU(B) vCPU(C) eCS states eCS states eCS states vCPU(A) state eCS states eCS states eCS states vCPU(B) state vCPU(C) state
Hypervisor VM
... ...
pcpu_overloaded (0/1)
Implementation
○ Rely on scheduler_tick() to avoid vCPU preemption
○ 60 LoC for annotating almost every lock-based critical section
25
Evaluation
26
Impact of eCS in over-committed scenario
27
Apache web server Psearchy
Preemptions avoided
Impact of eCS in under-committed scenario
28
Apache web server Psearchy
System eventual fairness
29
○ Each run for equal time (4.95 seconds out of 10 seconds)
Discussion
○ Leverage steal_time_struct that exposes preempted method
○ Use VM → Hypervisor API to mark functions
○ Require composable scheduling abstraction to support user space
30
Conclusion
31