Communication for Enterprise Appliances Anton Burtsev , Kiran - - PowerPoint PPT Presentation

communication for enterprise
SMART_READER_LITE
LIVE PREVIEW

Communication for Enterprise Appliances Anton Burtsev , Kiran - - PowerPoint PPT Presentation

Fido: Fast Inter-Virtual-Machine Communication for Enterprise Appliances Anton Burtsev , Kiran Srinivasan, Prashanth Radhakrishnan, Lakshmi N. Bairavasundaram, Kaladhar Voruganti, Garth R. Goodson NetApp, Inc University of Utah, School


slide-1
SLIDE 1

Fido: Fast Inter-Virtual-Machine Communication for Enterprise Appliances

Anton Burtsev†, Kiran Srinivasan, Prashanth Radhakrishnan, Lakshmi N. Bairavasundaram, Kaladhar Voruganti, Garth R. Goodson

†University of Utah,

School of Computing NetApp, Inc

slide-2
SLIDE 2

Enterprise appliances

2

  • High performance
  • Scalable and highly-available access

Network attached storage, routers, etc.

slide-3
SLIDE 3

Example Appliance

3

  • Monolithic kernel
  • Kernel components

Problems:

  • Fault isolation
  • Performance isolation
  • Resource provisioning
slide-4
SLIDE 4

Split architecture

4

slide-5
SLIDE 5

Benefits of virtualization

  • High availability
  • Fault-isolation
  • Micro-reboots
  • Partial functionality in case of failure
  • Performance isolation
  • Resource allocation
  • Consolidation and load balancing, VM migration
  • Non-disruptive updates
  • Hardware upgrades via VM migration
  • Software updates as micro-reboots
  • Computation to data migration

5

slide-6
SLIDE 6

Main Problem: Performance

Is it possible to match performance of a monolithic environment?

6

  • Large amount of data movement between components
  • Mostly cross-core
  • Connection oriented (established once)
  • Throughput optimized (asynchronous)
  • Coarse grained (no one-word messages)
  • Multi-stage data processing
  • Main cost contributors
  • Transitions to hypervisor
  • Memory map/copy operations
  • Not VM context switches (multi-cores)
  • Not IPC marshaling
slide-7
SLIDE 7

Main Insight: Relaxed Trust Model

  • Appliance is built by a single organization
  • Components:
  • Pre-tested and qualified
  • Collaborative and non-malicious
  • Share memory read-only across VMs!
  • Fast inter-VM communication
  • Exchange only pointers to data
  • No hypervisor calls (only cross-core notification)
  • No memory map/copy operations
  • Zero-copy across entire appliance

7

slide-8
SLIDE 8

Contributions

  • Fast inter-VM communication mechanism
  • Abstraction of a single address space for traditional

systems

  • Case study
  • Realistic microkernelized network attached storage

8

slide-9
SLIDE 9

Design

9

slide-10
SLIDE 10

Design Goals

  • Performance
  • High-throughput
  • Practicality
  • Minimal guest system and hypervisor dependencies
  • No intrusive guest kernel changes
  • Generality
  • Support for different communication mechanisms in the

guest system

10

slide-11
SLIDE 11

Transitive Zero Copy

11

  • Goal
  • Zero-copy across entire appliance
  • No changes to guest kernel
  • Observation
  • Multi-stage data processing
slide-12
SLIDE 12

Pseudo Global Virtual Address Space

12

264

Insight:

  • CPUs support 64-bit

address space

  • Individual VMs have

no need in it

slide-13
SLIDE 13

Pseudo Global Virtual Address Space

13

264

slide-14
SLIDE 14

Pseudo Global Virtual Address Space

14

264

slide-15
SLIDE 15

Transitive Zero Copy

15

slide-16
SLIDE 16

Fido: High-level View

16

slide-17
SLIDE 17

Fido: High-level View

17

  • “c” – connection management
  • “m” – memory mapping
  • “s” – cross-VM signaling
slide-18
SLIDE 18

IPC Organization

18

  • Shared memory ring
  • Pointers to data
slide-19
SLIDE 19

IPC Organization

19

  • Shared memory ring
  • Pointers to data
  • For complex data structures
  • Scatter-gather array
slide-20
SLIDE 20

IPC Organization

20

  • Shared memory ring
  • Pointers to data
  • For complex data structures
  • Scatter-gather array
  • Translate pointers
slide-21
SLIDE 21

IPC Organization

21

  • Shared memory ring
  • Pointers to data
  • For complex data structures
  • Scatter-gather array
  • Translate pointers
  • Signaling:
  • Cross-core interrupts (event channels)
  • Batching and in-ring polling
slide-22
SLIDE 22

Fast device-level communication

  • MMNet
  • Link-level
  • Standard network device interface
  • Supports full transitive zero-copy
  • MMBlk
  • Block-level
  • Standard block device interface
  • Zero-copy on write
  • Incurs one copy on read

22

slide-23
SLIDE 23

Evaluation

23

slide-24
SLIDE 24

MMNet Evaluation

24

  • AMD Opteron with 2 2.1GHz 4-core CPUs (8 cores

total)

  • 16GB RAM
  • NVidia 1Gbps NICs
  • 64-bit Xen (3.2), 64-bit Linux (2.6.18.8)
  • Netperf benchmark (2.4.4)

Loop NetFront MMNet XenLoop

slide-25
SLIDE 25

MMNet: TCP Throughput

2000 4000 6000 8000 10000 12000 0.5 1 2 4 8 16 32 64 128 256 Throughput (Mbps) Message Size (KB) Monolithic Netfront XenLoop MMNet

25

slide-26
SLIDE 26

MMBlk Evaluation

26

  • Same hardware
  • AMD Opteron with 2 2.1GHz 4-core CPUs (8 cores total)
  • 16GB Ram
  • NVidia 1Gbps NICs
  • VMs are configured with 4GB and 1GB RAM
  • 3 GB in-memory file system (TMPFS)
  • IOZone benchmark

MMNet XenBlk Monolithic

slide-27
SLIDE 27

MMBlk Sequential Writes

27

100 200 300 400 500 600 4 8 16 32 64 128 256 512 1K 2K 4K Throughput (MB/s) Record Size (KB) Monolithic XenBlk MMBlk

slide-28
SLIDE 28

Case Study

28

slide-29
SLIDE 29

Network-attached Storage

29

slide-30
SLIDE 30

Network-attached Storage

30

  • RAM
  • VMs have 1GB each, except FS VM (4GB)
  • Monolithic system has 7GB RAM
  • Disks :
  • RAID5 over 3 64MB/s disks
  • Benchmark
  • IOZone reads/writes 8GB file over NFS (async)
slide-31
SLIDE 31

Sequential Writes

31

10 20 30 40 50 60 70 80 90 4 8 16 32 64 128 256 512 1K 2K 4K Throughput (MB/s) Record Size (KB) Monolithic Native-Xen MM-Xen

slide-32
SLIDE 32

Sequential Reads

32

10 20 30 40 50 60 70 80 4 8 16 32 64 128 256 512 1K 2K 4K Throughput (MB/s) Record Size (KB) Monolithic Native-Xen MM-Xen

slide-33
SLIDE 33

TPC-C (On-Line Transactional Processing)

50 100 150 200 250 300 350 Transactions/minute (tpmC) Monolithic MMXen Native-Xen

33

slide-34
SLIDE 34

Conclusions

  • We match monolithic performance
  • “Microkernelization” of traditional systems is possible!
  • Fast inter-VM communication
  • The search for VM communication mechanisms is not over
  • Important aspects of design
  • Trust model
  • VM as a library (for example, FSVA)
  • End-to-end zero copy
  • Pseudo Global Virtual Address Space
  • There are still problems to solve
  • Full end-to-end zero copy
  • Cross-VM memory management
  • Full utilization of pipelined parallelism

34

slide-35
SLIDE 35

Thank you.

aburtsev@flux.utah.edu

35

slide-36
SLIDE 36

Backup Slides

36

slide-37
SLIDE 37

Related Work

  • Traditional microkernels [L4, Eros, CoyotOS]
  • Synchronous (effectively thread migration)
  • Optimized for single-CPU, fast context switch, small

messages (often in registers), efficient marshaling (IDL)

  • Buffer management [Fbufs, IOLite, Beltway Buffers]
  • Shared buffer is a unit of protection
  • Fast-forward – fast cache-to-cache data transfer
  • VMs [Xen split drivers, XWay, XenSocket, XenLoop]
  • Page flipping, later buffer sharing
  • IVC, VMCI
  • Language-based protection [Singularity]
  • Shared heap, zero-copy (only pointer transfer)
  • Hardware acceleration [Solarflare]
  • Multi-core OSes [Barrelfish, Corey, FOS]

37