RT-Xen: Real-Time Virtualization Sisu Xi, Meng Xu , Chenyang Lu, Linh - - PowerPoint PPT Presentation

rt xen real time virtualization
SMART_READER_LITE
LIVE PREVIEW

RT-Xen: Real-Time Virtualization Sisu Xi, Meng Xu , Chenyang Lu, Linh - - PowerPoint PPT Presentation

RT-Xen: Real-Time Virtualization Sisu Xi, Meng Xu , Chenyang Lu, Linh T.X. Phan, Christopher D. Gill, Insup Lee, Oleg Sokolsky Real-Time Virtualization Cars: Consolidate ~100 ECUs -> ~10 multicore processors Infotainment on Linux or


slide-1
SLIDE 1

RT-Xen: Real-Time Virtualization

Sisu Xi, Meng Xu, Chenyang Lu, Linh T.X. Phan, Christopher D. Gill, Insup Lee, Oleg Sokolsky

slide-2
SLIDE 2

Real-Time Virtualization

  • Cars: Consolidate ~100 ECUs -> ~10 multicore processors

 Infotainment on Linux or Android  Safety-critical control on AUTOSAR

  • Cloud Computing’s Killer App: Gaming [IEEE Spectrum]

 Need to compute and stream 30 to 50 frames per second 1

Applications must meet real-time performance constraints on virtualized platforms!

slide-3
SLIDE 3

RT-Xen: Real-Time Virtualization

  • Real-time hypervisor scheduling framework in Xen

 Implement a suite of real-time scheduling algorithms

  • Based on compositional scheduling theory

 VMs specify resource interfaces  Real-time guarantees to tasks in VMs

  • Open source

 RT-Xen patch submitted  https://sites.google.com/site/realtimexen/ 2

slide-4
SLIDE 4

Xen Virtualization Architecture

  • Guest OS runs on VCPUs
  • Hypervisor schedules VCPUs on PCPUs
  • Credit scheduler

 [Weight, Cap] per VM  Round robin 3

slide-5
SLIDE 5

RT-Xen Interface

  • VM resource interface

 A set of VCPUs, each characterized by <period, budget>  Optional: use cpumask to specify VCPU affinity with PCPUs  Hide task-specific information

  • Real-Time scheduling algorithms

 Ordering of VCPUs?

Priority scheme

 Placement of VCPUs?

Global vs. partition

 Resource isolation?

Server mechanisms

4

slide-6
SLIDE 6

Real-Time Scheduling Policies

  • Priority scheme

 Static priority: Deadline Monotonic (DM)  Dynamic priority: Earliest Deadline First (EDF)

  • Global scheduling

 Schedule VCPUs based on global information  Allow VCPU migration across cores  Flexible use of multiple cores  Migration overhead and cache penalty

  • Partitioned scheduling

 Assign and bind VCPUs to PCPUs  Schedule VCPUs on each core independently  May underutilize PCPUs  No migration overhead or associated cache penalty

5

slide-7
SLIDE 7

Scheduling a VCPU as a Deferrable Server

6

time T1 (10, 3) T2 (10, 3) 5 10 15 5 10 15 time Deferrable Server (5,3) Budget

  • A VCPU receives budget us of CPU resources every period us

 Budget is replenished at every start of period  VCPU consumes budget when running, suspends when no budget

left

 Preserves budget when there is no task

slide-8
SLIDE 8

RT-Xen Investigation Roadmap

  • Single-core

RT-Xen 1.0

  • Single-core enhanced

RT-Xen 1.1

  • Multi-core

RT-Xen 2.0

7 Global Scheduling Fixed Priority (DM) Partitioned Scheduling Dynamic Priority (EDF)

Periodic Deferrable Periodic Deferrable Deferrable Periodic Deferrable Periodic Sporadic Polling

Work Conserving Periodic Capacity Reclaiming Periodic

gDM gEDF pEDF pDM

slide-9
SLIDE 9

RT-Xen 2.0: Run Queues

  • A run queue

 holds VCPUs that are runnable (have task to run)  has two parts: VCPUs with budget and out of budget  is sorted by priority (DM or EDF) within each part

  • rt-global: all cores share one run queue with a spinlock
  • rt-partition: one run queue per core
  • Patches for more efficient implementation on the way!

8

RunQ

VCPUs with budget VCPUs out of budget Sorted by priority

slide-10
SLIDE 10

Experimental Setup

  • Hardware: Intel i7 processor, six cores running at 3.33 GHz

 Dedicate one PCPU to domain 0  All guest VMs use the remaining cores

  • Cache architecture

 Each core has dedicated L1 cache (32 KB) and L2 cache (256 KB)  All six cores share L3 cache (12 MB)  Inclusive L3 cache, all data in L2 cache must also be in L3 cache

  • Software

 Xen 4.3 patched with RT-Xen  Guest OS: Linux patched with LITMUSRT 9

slide-11
SLIDE 11

RT-Xen 2.0: Scheduling Overhead

10

rt-global has extra overhead due to global lock credit has high max overhead due to load balancing

slide-12
SLIDE 12

RT-Xen 2.0: Credit Scheduler

11

credit missed deadline at 22% CPU capacity RT-Xen delivers real-time performance up to 78%

slide-13
SLIDE 13

RT-Xen 2.0: gEDF vs. pEDF

12

Global scheduling wins empirically!

gEDF + deferrable server -> best real-time performance

slide-14
SLIDE 14

Demo

  • YouTube: “RT-Xen Demonstration”

 https://www.youtube.com/watch?v=wisxWn3mR5s 13

slide-15
SLIDE 15

Patch Status

  • Patch RFC v1 & v2

July 10th & July 29th

 gEDF + deferrable server  cpupool support

  • Patch RFC v3

Expected on Aug 24th

 Scheduling trace support  Performance improvement (splitting RunQ)

  • Patch RFC v4

Expected before Sep 10th

 Performance improvement

  • Timer based budget replenishment
  • Improve the timing resolution of budget

14

slide-16
SLIDE 16

Conclusion

  • Diverse applications demand real-time virtualization

 Real-time virtualization in embedded area  Cloud gaming

  • RT-Xen provides real-time performance

 Efficient implementation of diverse real-time scheduling policies  Leverage compositional scheduling theory -> analytical guarantee  gEDF + deferrable server wins empirically 15

slide-17
SLIDE 17

Research Contributions

  • RT-Xen 1.0: S. Xi, J. Wilson, C. Lu, and C.D. Gill,

RT-Xen: Towards Real-Time Hypervisor Scheduling in Xen, ACM International Conferences on Embedded Software (EMSOFT) 2011

  • RT-Xen 1.1: J. Lee, S. Xi, S. Chen, L.T.X. Phan, C. Gill, I. Lee, C. Lu and O. Sokolsky

Realizing Compositional Scheduling through Virtualization, IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS) 2012

  • RT-Xen 2.0: S. Xi, M. Xu, C. Lu, L.T.X. Phan, C. Gill, I. Lee, and O. Sokolsky

Real-Time Multi-Core Virtual Machine Scheduling in Xen, ACM International Conferences on Embedded Software (EMSOFT) 2014

  • RT-Xen 2.1 + RT-OpenStack: S. Xi, C. Li, C. Lu, C. Gill, M. Xu, L.T.X. Phan, C. Gill, I. Lee, and O. Sokolsky

RT-OpenStack: Co-Hosting RT VM with non RT VMs, in submission

  • RTCA: S. Xi, C. Li, C. Lu, and C. Gill

Prioritizing Local Inter-Domain Communication in Xen, ACM/IEEE International Symposium on Quality of Service (IWQoS) 2013

  • RT-Xen patch (gEDF with deferrable server)
  • RT-Xen: Real-Time Virtualization in Xen, Xen Blog, 2013
  • RT-Xen: Real-Time Virtualization in Xen, Xen Developer Summit, 2014

16

slide-18
SLIDE 18

Backup Slides

17

slide-19
SLIDE 19

RT-Xen 1.0

18

slide-20
SLIDE 20

Implementation – PCPU

19  Budget X Budget  Task RunQ RunQ X Task RdyQ RdyQ

VCPU Position

Three Queues within One Physical Core

VCPU Params

(period, budget, priority)

IDLE

Periodic

0 3 IDLE

slide-21
SLIDE 21

Server Design – Deferrable & Polling

  • Servers (Period, Budget, Priority)

S1 (5, 3) with Two Tasks T1 (10, 3 )

20

T2 (10, 3 )

Time 5 0 10 15

Budget in S1 Actual Execution

Deferrable Server

Time 5 0 10 15 3

back-to-back

Budget in S1 Actual Execution

Polling Server

Time 5 0 10 15 3

  • 1. Replenish?
  • 2. Budget but NO task?
slide-22
SLIDE 22

Server Design – Periodic & Sporadic

  • Servers (Period, Budget, Priority)

S1 (5, 3) with Two Tasks T1 (10, 3 )

21

T2 (10, 3 )

Time 5 0 10 15

Budget in S1 Actual Execution

Sporadic Server

Time 5 0 10 15 3

5 +3 5 +3

  • verhead ++

Budget in S1 Actual Execution

Periodic Server

Time 5 0 10 15 3

theory favored

  • 1. Replenish?
  • 2. Budget but NO task?

?

slide-23
SLIDE 23

RT-Xen 2.0

22

slide-24
SLIDE 24

RT-Xen 2.0: Workload

  • Periodic task sets: [period, execution time, deadline]
  • CPU-intensive, independent tasks
  • Randomly generate the task sets until a total task utilization, then

distribute tasks to four VMs, and apply compositional scheduling theory to calculate each VM’s resource interface

  • 25 task sets per data point, measure fraction of schedulable

tasksets

23

slide-25
SLIDE 25

RT-Xen 2.0: Context Saved

8/18/2014

24

Less than 1 us overhead for the spinlock

slide-26
SLIDE 26

RT-Xen 2.0: Theory vs. Experiments

25

gEDF < pEDF theoretically due to pessimistic analysis gEDF > pEDF empirically, thanks to global scheduling

slide-27
SLIDE 27

RT-Xen 2.0: Theory vs. Experiments

8/18/2014

26

gEDF > pEDF empirically, thanks to global scheduling gEDF < pEDF theoretically due to pessimistic analysis

slide-28
SLIDE 28

RT-Xen 2.0: How about Cache?

27

Benefit of global scheduling dominate migration cost

  • n a shared L3-cache platform.
slide-29
SLIDE 29

RT-Xen 2.0: Context Switch

8/18/2014

28

“Fake” context switch with idle VCPUs