Mul Multitena nancy ncy for r Fast and nd Programmabl ble Network rks in n the he Cl Cloud ud
Tao Wang*, Hang Zhu*, Fabian Ruffy, Xin Jin, Anirudh Sivaraman, Dan Ports, and Aurojit Panda (*Equal contribution)
Mul Multitena nancy ncy for r Fast and nd Programmabl ble - - PowerPoint PPT Presentation
Mul Multitena nancy ncy for r Fast and nd Programmabl ble Network rks in n the he Cl Cloud ud Tao Wang * , Hang Zhu * , Fabian Ruffy, Xin Jin, Anirudh Sivaraman, Dan Ports, and Aurojit Panda ( * Equal contribution) Wha What do does
Tao Wang*, Hang Zhu*, Fabian Ruffy, Xin Jin, Anirudh Sivaraman, Dan Ports, and Aurojit Panda (*Equal contribution)
ØGeneric compute and storage resources ØSpecialized accelerators
2
ØPipeline-based programmable devices
ØIn-network switches ØAt-host SmartNICs
ØEnable wide-range innovations for classical networked systems
ØConsensus: NOPaxos, NetPaxos ØConcurrency control: Eris ØCaching: NetCache, IncBricks ØStorage: NetChain, SwitchKV ØApplications: SwitchML, NetAccel Ø…
3
ØNeed of multitenancy support ØProvider’s aspect
ØImprove resource utilization
ØOne application can hardly consume all the hardware resources ØHeterogenous resource requirement
ØTenant’s aspect
ØEnable innovations
ØNew programs can be easily tested w/o impacting basic network functionality
4
Our vision: a hybrid compile-time and run-time solution
Requirements: ØResource efficiency ØLittle overhead ØIsolation ØPerformance ØAllocated resource
How to enable multitenancy y for programmable devices?
5
Parser … Match Action Match Action Stage 1 Ingress Pipeline
Egress Pipeline … Queues Stateful Mem Circuit … Ethernet header … Packet Headers
Queue length Hardware enqueue port Per-packet Metadata
Exact match Xbar Ternary match Xbar SRAMs/TCAMs PHV container e.g., register Action units
6
Performance Programmability
ØVarious types of hardware resources
ØMost of them are decided during compile time
ØLimited run-time support
ØHardware wirings are decided during compile time
ØLine-rate performance achieved after successful compilation
ØNo temporal scheduling (e.g., CPU or NPU scheduling) ØNo spatial reconfiguration (e.g., FPGA [AmorphOS, OSDI’18])
ØResource efficiency ØLittle overhead ØIsolation ØPerformance ØAllocated resource
7
ØCompile-time program linker
ØTarget generic resources (e.g., SRAMs/TCAMs, action units, etc.) ØBut static
ØRun-time memory allocator
ØTarget stateful memory ØBut limited
8
Resource Sharing Policy Resource Usage Checker Program Linker Merged Jumbo Program S T1 Tn
…
Tenants Translation Layer
S u b m i t r e q u e s t
Data Plane Control Plane Header & Metadata Stage 1 Stage 2 Stage 3 Stage m … Table Entry Handler Run-time Memory Allocator Utility Calculator Reallocation Problem Solver
Config Params
One Big Array
…
Sys & Tenant Tables
One Big Array
Sys & Tenant Tables
One Big Array
Counter Record
One Big Array
1 2 3 Compile-time Linker
ØRestrict resource usage ØProvide isolation
ØEnsure tenant program does not inference with others’ ØEnsure no infinite packet resubmitting ØEnsure no loop forwarding configuration Ø…
10
ØFixed packet format
ØEth, VLAN, IP, TCP or UDP header followed by custom headers
ØSystem program
ØExtract common headers
ØTenant Programs
ØExtract tenant-defined headers Parser
Header { Ethernet hdr IP hdr VLAN hdr TCP or UDP hdr T1 hdr … Tn hdr } apply S’s parser to extract common headers System Program if (tag==T1’s VID) apply T1’s parser … Tenant Programs
11
ØFeed-forward packet flow
Ø“Sandwich” architecture
Øwrite-then-read half Øread-then-write half
ØSystem program
ØInteract with tenant programs ØE.g., pass system states ØConvert virtual addresses to physical
Control Pipeline
System states { … link utilization packet count … } Pass system states to tenants if (tag==T1’s VID) apply T1’s ctrl … Convert to system states System states { egress_port … }
Packet Flow
12
Config Params One Big Array One Big Array Counter Record One Big Array Memory allocator Control Plane
ØPage-table-like indirection
Match Action VID==1 metadata.offset=0 metadata.amount=26 VID==2 metadata.offset=512 metadata.amount=24 … … pkt.physical_address = metadata.offset + (pkt.virtual_address % metadata.amount) Register Array Tenant 1 Tenant 2
13
ØPrototype on Barefoot Tofino switch ØCompile-time linker
ØExtend open-source P4 compiler[1]
ØRun-time memory allocator
ØBase on auto-generated APIs to pull records and modify table entries
[1] https://github.com/p4lang/p4c
14
ØResource usage on Tofino ØPacket-level validation on PTF
ØSys program
ØBasic parsing and forwarding logics
Ø[SOSP’17] NetCache Ø[NSDI’18] NetChain
ØOverhead
ØAdditional gateway tables to check which program to be executed ØAdditional tag-along PHV containers
50 100 150
E x a c t M a t c h X b a r S R A M H a s h B i t s U n i t A c t i
U n i t s # S t a g e s G a t e w a y T a b l e s P H V Resource Usage (% of total) Merged program Sys program NetCache NetChain
15
ØExperimental Setting
Ø64 tenants submit 1-min heavy hitter detection task against source IP address within its /6 subnets Ø10-min CAIDA trace replay
ØEvaluation metric
ØUtility: memory hit ratio ØSatisfaction: time fraction w/ utility > 0.9 ØWe show the mean and 5th percentile
16
ØTakeaways
ØA hybrid solution for multi-tenancy support ØCompile-time linker: general but static ØRun-time memory allocator: dynamic but limited
ØFuture work
ØSeek new hardware design
ØBoth general and dynamic
17
Happy to take questions tw1921@nyu.edu