Scaling Guest OS Critical Sections with e CS Sanidhya Kashyap, - - PowerPoint PPT Presentation

scaling guest os critical sections with e cs
SMART_READER_LITE
LIVE PREVIEW

Scaling Guest OS Critical Sections with e CS Sanidhya Kashyap, - - PowerPoint PPT Presentation

Scaling Guest OS Critical Sections with e CS Sanidhya Kashyap, Changwoo Min, Taesoo Kim The physical and virtual CPU abstraction Mismatch between CPU abstraction 2 The physical and virtual CPU abstraction Mismatch between CPU


slide-1
SLIDE 1

Scaling Guest OS Critical Sections with eCS

Sanidhya Kashyap, Changwoo Min, Taesoo Kim

slide-2
SLIDE 2

The physical and virtual CPU abstraction

  • Mismatch between

CPU abstraction

2

slide-3
SLIDE 3

The physical and virtual CPU abstraction

3

Physical machine (Host)

pCPU 1 pCPU 2 pCPU 3 pCPU 4

  • Mismatch between

CPU abstraction

slide-4
SLIDE 4

The physical and virtual CPU abstraction

4

Hardware abstraction Physical machine (Host)

pCPU 1 pCPU 2 pCPU 3 pCPU 4

  • Mismatch between

CPU abstraction

slide-5
SLIDE 5

The physical and virtual CPU abstraction

5

Physical machine (Host)

pCPU 1

Hypervisor

pCPU 2 pCPU 3 pCPU 4

Virtual machine

vCPU 1 vCPU 2 vCPU 3 vCPU 4 App App App

...

Hardware abstraction

  • Mismatch between

CPU abstraction

slide-6
SLIDE 6

The physical and virtual CPU abstraction

6

Hardware abstraction Software abstraction Physical machine (Host)

pCPU 1

Hypervisor

pCPU 2 pCPU 3 pCPU 4

Virtual machine

vCPU 1 vCPU 2 vCPU 3 vCPU 4 App App App

...

  • Mismatch between

CPU abstraction

slide-7
SLIDE 7

The physical and virtual CPU abstraction

  • Mismatch between

CPU abstraction

  • VM consolidation
  • Contention on pCPU

7

Hardware abstraction Software abstraction Physical machine (Host)

pCPU 1

Hypervisor

pCPU 2 pCPU 3 pCPU 4

Virtual machine

vCPU 1 vCPU 2 vCPU 3 vCPU 4 App App App

...

Multiple vCPUs Physical machine (Host)

pCPU 1

Hypervisor

pCPU 2 pCPU 3 pCPU 4

VM2

Apps

VM3

Apps

VM4

Apps

VM1

Apps

slide-8
SLIDE 8

The physical and virtual CPU abstraction

  • Mismatch between

CPU abstraction

  • VM consolidation
  • Contention on pCPU

8

Hardware abstraction Software abstraction Physical machine (Host)

pCPU 1

Hypervisor

pCPU 2 pCPU 3 pCPU 4

Virtual machine

vCPU 1 vCPU 2 vCPU 3 vCPU 4 App App App

...

Multiple vCPUs Physical machine (Host)

pCPU 1

Hypervisor

pCPU 2 pCPU 3 pCPU 4

VM2

Apps

VM3

Apps

VM4

Apps

VM1

Apps

A vCPU can be preempted without notification

slide-9
SLIDE 9

The physical and virtual CPU abstraction

  • Mismatch between

CPU abstraction

  • VM consolidation
  • Contention on vCPU

9

Hardware abstraction Software abstraction Physical machine (Host)

pCPU 1

Hypervisor

pCPU 2 pCPU 3 pCPU 4

Virtual machine

vCPU 1 vCPU 2 vCPU 3 vCPU 4 App App App

...

Multiple vCPUs Physical machine (Host)

pCPU 1

Hypervisor

pCPU 2 pCPU 3 pCPU 4

VM2

Apps

VM3

Apps

VM4

Apps

VM1

Apps

A vCPU can be preempted without notification Double scheduling issue

slide-10
SLIDE 10

vCPU 1 vCPU 3 vCPU 2 vCPU 1

Double scheduling: Lock holder preemption (LHP)

  • vCPU holding a lock is preempted
  • Preemption hinders forward progress of the VM
  • Can lead to application slowdown by 20 -- 130%

10

vCPU scheduled vCPU preempted A B C

File

Access a file Running task in a VM

slide-11
SLIDE 11

Efforts to mitigate preemption issues

11

  • Focussed only non-blocking locks

○ Acquire iff sufficient schedule time

  • Hotplug vCPUs on the fly

○ May not scale to large vCPU VMs

  • VM co-scheduling

○ Does not always alleviate the issue

  • Mostly address other preemption

problem

○ Blocking locks ○ Unfair non-blocking locks

  • Hardware features to mitigate

preemptions

Research efforts Current practice

slide-12
SLIDE 12

Efforts to mitigate preemption issues

12

  • Focussed only non-blocking locks

○ Acquire iff sufficient schedule time

  • Hotplug vCPUs on the fly

○ May not scale to large vCPU VMs

  • VM co-scheduling

○ Does not always alleviate the issue

  • Mostly address other preemption

problem

○ Blocking locks ○ Unfair non-blocking locks

  • Hardware features to mitigate

preemptions

Research efforts Current practice

Prior approaches are mostly specialized

slide-13
SLIDE 13

Still the double scheduling is looming!

  • LHP for blocking locks

○ mutex, rwsem

  • Readers preemption (RP) in read-write locks

○ A reader is preempted while holding the lock

  • Interrupt context preemption (ICP)

○ Preemption of a vCPU processing an interrupt

13

  • Blocked-waiter wakeup (BWW)

○ Waking up a blocked thread on an idle vCPU is at least 10 times costlier

slide-14
SLIDE 14

Still the double scheduling is looming!

  • LHP for blocking locks

○ mutex, rwsem

  • Readers preemption (RP) in read-write locks

○ A reader is preempted while holding the lock

  • Interrupt context preemption (ICP)

○ Preemption of a vCPU processing an interrupt

14

  • Blocked-waiter wakeup (BWW)

○ Waking up a blocked thread on an idle vCPU is at least 10 times costlier

Semantic gap between virtual and physical CPU

slide-15
SLIDE 15

Our approach to address semantic gap

15

Insight: A vCPU may be running a critical task! Approach: Avoid preempting a vCPU with a critical task Design: Identify and mark/unmark a critical task

slide-16
SLIDE 16

vCPU 1 vCPU 1 vCPU 2 vCPU 2 vCPU 3

Identifying each critical section with eCS

16

Scheduled vCPU Preempted vCPU A B C

File

Access a file

  • Synchronization primitives protect critical sections → ensure OS progress
  • Mark and unmark critical sections before and after the critical section
  • Conservative, but effective approach to address each preemption problem

○ 60 LoC annotates 85K lock invocations in 13M LoC in Linux

Running task in a VM

slide-17
SLIDE 17

vCPU 1 vCPU 1 vCPU 2 vCPU 2 vCPU 3

Identifying each critical section with eCS

17

Scheduled vCPU Preempted vCPU A B C

File

Access a file Enlightened vCPU

  • Synchronization primitives protect critical sections → ensure OS progress
  • Mark and unmark critical sections before and after the critical section
  • Conservative, but effective approach to address each preemption problem

○ 60 LoC annotates 85K lock invocations in 13M LoC in Linux

Running task in a VM

slide-18
SLIDE 18

vCPU 1 vCPU 2 vCPU 2 vCPU 3

Identifying each critical section with eCS

18

Scheduled vCPU Preempted vCPU A B C

File

Access a file Enlightened vCPU

  • Synchronization primitives protect critical sections → ensure OS progress
  • Mark and unmark critical sections before and after the critical section
  • Conservative, but effective approach to address each preemption problem

○ 60 LoC annotates 85K lock invocations in 13M LoC in Linux

Running task in a VM

slide-19
SLIDE 19

Sharing the state for efficient notification

19

vCPU(A) vCPU(B) vCPU(C) eCS states eCS states eCS states

VM

...

pcpu_overloaded (0/1) vcpu_preempted (0/1) non_preemptable_ecs_count preemptable_ecs_count vCPU(A) state eCS states eCS states eCS states vCPU(B) state vCPU(C) state

Hypervisor

...

  • Each vCPU shares memory with the hypervisor
  • vCPU updates information for critical sections

○ Notifies critical task to the hypervisor

  • Hypervisor also updates scheduler context

before/after scheduling out a vCPU ○ Enables vCPU to make efficient scheduling decisions

slide-20
SLIDE 20

Lightweight para-virtualized APIs to update states

20

vCPU(A) vCPU(B) vCPU(C) eCS states eCS states eCS states

VM

...

pcpu_overloaded (0/1) vcpu_preempted (0/1)

Hint API VM → Hypervisor activate_non_preemptable_ecs(cpu) deactivate_non_preemptable_ecs(cpu_id) activate_preemptable_ecs(cpu_id)) deactivate_preemptable_ecs(cpu_id) Hypervisor → VM is_vcpu_preempted(cpu_id) is_pcpu_overloaded(cpu_id)

non_preemptable_ecs_count preemptable_ecs_count vCPU(A) state eCS states eCS states eCS states vCPU(B) state vCPU(C) state

Hypervisor

...

Updated by each vCPU; read by the hypervisor Update by the hypervisor; read by a vCPU

slide-21
SLIDE 21

vCPU 1 vCPU 3 vCPU 2 vCPU 1

Hypervisor checks eCS state before scheduling out a vCPU

21

A B C

File

Access a file

vCPU(A) vCPU(B) vCPU(C) eCS states eCS states eCS states ecs_count (0)

VM1

...

ecs_count (1) ecs_count (0)

...

Time shared pCPU 1 vCPU 1VM2 vCPU 1VM1 Scheduled vCPU Preempted vCPU Enlightened vCPU Running task in a VM

➀ Running vCPU 1 ➁ vCPU 1 acquires lock ➂ vCPU 1 updates eCS count ➃ Hypervisor checks states before vCPU 1 preemption ➄ Hypervisor lets vCPU 1 runs for extra time ➅ vCPU 1 finishes and updates eCS count ➆ Hypervisor penalizes vCPU 1 later

VM1

➀ ➁ ➂ ➃ ➄ ➅ ➆

slide-22
SLIDE 22

vCPU 1 vCPU 3 vCPU 2 vCPU 1

Hypervisor checks eCS state before scheduling out a vCPU

22

A B C

File

Access a file

vCPU(A) vCPU(B) vCPU(C) eCS states eCS states eCS states ecs_count (0)

VM1

...

ecs_count (1) ecs_count (0)

...

Time shared pCPU 1 vCPU 1VM2 vCPU 1VM1 Scheduled vCPU Preempted vCPU Enlightened vCPU Running task in a VM

➀ Running vCPU 1 ➁ vCPU 1 acquires lock ➂ vCPU 1 updates eCS count ➃ Hypervisor checks states before vCPU 1 preemption ➄ Hypervisor lets vCPU 1 runs for extra time ➅ vCPU 1 finishes and updates eCS count ➆ Hypervisor penalizes vCPU 1 later

VM1

➀ ➁ ➂ ➃ ➄ ➅ ➆

Extended schedule Penalized schedule

slide-23
SLIDE 23

The case for system eventual fairness

  • Hypervisor accounts extra time and later penalizes the enlightened VM

○ Penalize the schedule of an enlightened VM ○ Extend the schedule of the very next VM

  • Hypervisor optimistically extends time for an enlightened CS

○ Decision made just before scheduling out a vCPU ○ Extra time (schedule) to avoid preemption: 1 ms

23

slide-24
SLIDE 24

Even vCPU can make efficient scheduling decisions

  • Share the hypervisor context with each VM

○ Lock waiters can avoid bWW problem

  • Virtualized scheduling-aware spinning

○ Lock waiter keeps spinning until the lock is not acquired if the pCPU is not overloaded

24

vCPU(A) vCPU(B) vCPU(C) eCS states eCS states eCS states vCPU(A) state eCS states eCS states eCS states vCPU(B) state vCPU(C) state

Hypervisor VM

... ...

pcpu_overloaded (0/1)

slide-25
SLIDE 25

Implementation

  • Rely on paravirtualized VM
  • Extended scheduler’s preempt_notifier API to check eCS states

○ Rely on scheduler_tick() to avoid vCPU preemption

  • Overall implementation is 1000 LoC

○ 60 LoC for annotating almost every lock-based critical section

25

slide-26
SLIDE 26

Evaluation

  • Does eCS improves VM’s performance?
  • Does hypervisor maintain system eventual fairness?
  • Setup: 8-socket, 80-core NUMA machine

26

slide-27
SLIDE 27

Impact of eCS in over-committed scenario

27

Apache web server Psearchy

  • Experiment: run two VMs running same application
  • eCS improves application throughput by 1.2 -- 2.3X
  • eCS avoids preemptions by 85.8--100% → an extra schedule tick is sufficient

Preemptions avoided

slide-28
SLIDE 28

Impact of eCS in under-committed scenario

28

  • Experiment: Run only one VM with an application
  • eCS improves application performance by 1.2 -- 1.9X
  • Virtualized scheduling-aware spinning addresses BWW for blocking locks

Apache web server Psearchy

slide-29
SLIDE 29

System eventual fairness

29

  • Experiment: an application reading a file
  • Hypervisor’s scheduler (CFS) maintains eventual fairness
  • Both VMs get equal time even though VM2 (eCS) is granted extra schedules
  • CFS maintains eventual fairness by penalizing VM2

○ Each run for equal time (4.95 seconds out of 10 seconds)

slide-30
SLIDE 30

Discussion

  • Right approach for Linux adoption

○ Leverage steal_time_struct that exposes preempted method

  • Annotation

○ Use VM → Hypervisor API to mark functions

  • Extending the concept to the userspace

○ Require composable scheduling abstraction to support user space

30

slide-31
SLIDE 31

Conclusion

  • Double scheduling leads to several preemption problems
  • Six lightweight paravirtualized methods to annotate critical sections
  • Leverage hypervisor’s scheduler to mitigate vCPU preemptions
  • Allow vCPU to make efficient scheduling decision
  • A generic approach to mitigate all preemption problems!

31

Thank you!