Programming The Network Data Plane Changhoon Kim Beautiful ideas: - - PowerPoint PPT Presentation
Programming The Network Data Plane Changhoon Kim Beautiful ideas: - - PowerPoint PPT Presentation
Programming The Network Data Plane Changhoon Kim Beautiful ideas: What if you could Realize a small, but super-fast DNS cache Perform TCP SYN authentication for billions of SYNs per sec Build a replicated key-value store ensuring
Beautiful ideas: What if you could …
- Realize a small, but super-fast DNS cache
- Perform TCP SYN authentication for billions of SYNs per sec
- Build a replicated key-value store ensuring RW ops in a few usecs
- Improve your consensus service performance by ~100x
- Boost your Memcached cluster’s throughput by ~10x
- Speed up your DNN training dramatically by realizing parameter
servers
2
… using switches in your network?
You couldn’t do any of those so far because …
- No DIY – must work with vendors at feature level
- Excruciatingly complicated and involved process to build
consensus and pressure for features
- Painfully long and unpredictable lead time
- To use new features, you must get new switches
- What you finally get != what you asked for
3
This is very unnatural to developers
- Because you all know how to realize your own ideas by
“programming” CPUs
– Programs used in every phase (implement, test, and deploy) – Extremely fast iteration and differentiation – You own your own ideas – A sustainable ecosystem where all participants benefit
4
Can we replicate this healthy, sustainable ecosystem for networking?
Reality: Packet forwarding speeds
0.1 1 10 100 1000 10000 100000 1990 1995 2000 2005 2010 2015 2020 Switch Chip CPU
5
Gb/s
(per chip)
6.4Tb/s
Reality: Packet forwarding speeds
0.1 1 10 100 1000 10000 100000 1990 1995 2000 2005 2010 2015 2020 Switch Chip CPU
6
80x
Gb/s
(per chip)
6.4Tb/s
7
What does a typical switch look like?
Chip Driver Run-time API Protocol Daemons (BGP, OSPF, etc.) Other Mgmt Apps
Data plane Control plane
Switch OS (Linux variant) PCIe …
L2 Forwarding Table L3 Routing Table ACL Table
…
A switch is just a Linux box with a high-speed switching chip
packets packets Just S/W -- You can freely change this Fixed-function H/W
- - There’s nothing
you can change here
Networking systems have been built “bottoms-up”
Switch OS
“This is roughly how I process packets …” Fixed-function switch in English
API
Turning the tables “top-down”
Switch OS
“This is precisely how you must process packets” Programmable Switch in P4
API
“Programmable switches are 10 -100x slower
than fixed-function switches. They cost more and consume more power.”
Conventional wisdom in networking
Evidence: Tofino 6.5Tb/s switch (arrived Dec 2016)
The world’s fastest and most programmable switch.
No power, cost, or power penalty compared to fixed-function switches. An incarnation of PISA (Protocol Independent Switch Architecture)
Domain-specific processors
CPU
Computers Java Compiler
GPU
Graphics OpenCL Compiler
DSP
Signal Processing Matlab Compiler Machine Learning
?
TPU
TensorFlow
Compiler Networking
?
Language Compiler
>>>
CPU
Computers Java Compiler
GPU
Graphics OpenCL Compiler
DSP
Signal Processing Matlab Compiler Machine Learning
?
TPU
TensorFlow
Compiler
PISA
Networking P4 Compiler
>>>
Domain-specific processors
PISA: An architecture for high-speed programmable packet forwarding
14
15
Programmable Parser
Match
Memory
Action
ALU
PISA: Protocol Independent Switch Architecture
16
Programmable Parser
PISA: Protocol Independent Switch Architecture
Ingress Egress Buffer
Buffer M M
17
Programmable Parser
PISA: Protocol Independent Switch Architecture
Match Logic
(Mix of SRAM and TCAM for lookup tables, counters, meters, generic hash tables)
Action Logic
(ALUs for standard boolean and arithmetic operations, header modification operations, hashing operations, etc.) Recirculation Programmable Packet Generator CPU (Control plane) A
…
A
…
Ingress match-action stages (pre-switching) Egress match-action stages (post-switching)
Generalization of RMT [sigcomm’13]
Why we call it protocol-independent packet processing
18
Logical Data-plane View (your P4 program) Switch Pipeline
Device does not understand any protocols until it gets programmed
Queues Programmable Parser
Fixed Action Match Table Match Table Match Table Match Table L2 IPv4 IPv6 ACL Action ALUs Action ALUs Action ALUs Action ALUs
packet packet packet packet
CLK
19
Match Table Action ALUs
Mapping logical data-plane design to physical resources
Queues
Match Table Match Table Match Table L2 Table IPv4 Table IPv6 Table ACL Table Action ALUs Action ALUs Action ALUs L2 IPv4 IPv6 ACL
Logical Data-plane View (your P4 program) Switch Pipeline
L2 IPv6 ACL IPv4
L2 Action Macro v4 Action Macro v6 Action Macro ACL Action Macro
Programmable Parser
CLK
20
Re-program in the field
L2 Table IPv4 Table ACL Table IPv6 Table
My Encap
L2 IPv4 IPv6 ACL
MyEncap
L2 Action Macro v4 Action Macro ACL Action Macro Action
MyEncap
v6 Action Macro
IPv4
Action
IPv4
Action
IPv6
Action
IPv6
Programmable Parser
CLK
Logical Data-plane View (your P4 program) Switch Pipeline Queues
21
P4: Programming Protocol-Independent Packet Processors
Pat Bosshart†, Dan Daly*, Glen Gibb†, Martin Izzard†, Nick McKeown‡, Jennifer Rexford**, Cole Schlesinger**, Dan Talayco†, Amin Vahdat¶, George Varghese§, David Walker**
†Barefoot Networks *Intel ‡Stanford University **Princeton University ¶Google §Microsoft ResearchABSTRACT
P 4 i s a h i g h
- l
e v e l l a n g u a g e f
- r
p r
- g
r a m m i n g p r
- t
- c
- l
- i
n d e
- p
e n d e n t p a c k e t p r
- c
e s s
- r
s . P 4 w
- r
k s i n c
- n
j u n c t i
- n
w i t h S D N c
- n
t r
- l
p r
- t
- c
- l
s l i k e O p e n F l
- w
. I n i t s c u r r e n t f
- r
m , O p e n F l
- w
e x p l i c i t l y s p e c i fi e s p r
- t
- c
- l
h e a d e r s
- n
w h i c h i t
- p
e r a t e s . T h i s s e t h a s g r
- w
n f r
- m
1 2 t
- 4
1 fi e l d s i n a f e w y e a r s , i n c r e a s i n g t h e c
- m
p l e x i t y
- f
t h e s p e c i fi c a t i
- n
w h i l e s t i l l n
- t
p r
- v
i d i n g t h e fl e x i b i l i t y t
- a
d d n e w h e a d e r s . I n t h i s p a p e r w e p r
- p
- s
e P 4 a s a s t r a w m a n p r
- p
- s
a l f
- r
h
- w
O p e n
- F
l
- w
s h
- u
l d e v
- l
v e i n t h e f u t u r e . W e h a v e t h r e e g
- a
l s : ( 1 ) R e c
- n
fi g u r a b i l i t y i n t h e fi e l d : P r
- g
r a m m e r s s h
- u
l d b e a b l e t
- c
h a n g e t h e w a y s w i t c h e s p r
- c
e s s p a c k e t s
- n
c e t h e y a r e d e p l
- y
e d . ( 2 ) P r
- t
- c
- l
i n d e p e n d e n c e : S w i t c h e s s h
- u
l d n
- t
b e t i e d t
- a
n y s p e c i fi c n e t w
- r
k p r
- t
- c
- l
s . ( 3 ) T a r g e t i n d e
- p
e n d e n c e : P r
- g
r a m m e r s s h
- u
l d b e a b l e t
- d
e s c r i b e p a c k e t
- p
r
- c
e s s i n g f u n c t i
- n
a l i t y i n d e p e n d e n t l y
- f
t h e s p e c i fi c s
- f
t h e u n d e r l y i n g h a r d w a r e . A s a n e x a m p l e , w e d e s c r i b e h
- w
t
- u
s e P 4 t
- c
- n
fi g u r e a s w i t c h t
- a
d d a n e w h i e r a r c h i c a l l a b e l .
1. INTRODUCTION
S
- f
t w a r e
- D
e fi n e d N e t w
- r
k i n g ( S D N ) g i v e s
- p
e r a t
- r
s p r
- g
r a m m a t i c c
- n
t r
- l
- v
e r t h e i r n e t w
- r
k s . I n S D N , t h e c
- n
- t
r
- l
p l a n e i s p h y s i c a l l y s e p a r a t e f r
- m
t h e f
- r
w a r d i n g p l a n e , a n d
- n
e c
- n
t r
- l
p l a n e c
- n
t r
- l
s m u l t i p l e f
- r
w a r d i n g d e v i c e s . W h i l e f
- r
w a r d i n g d e v i c e s c
- u
l d b e p r
- g
r a m m e d i n m a n y w a y s , h a v i n g a c
- m
m
- n
,
- p
e n , v e n d
- r
- a
g n
- s
t i c i n t e r f a c e ( l i k e O p e n F l
- w
) e n a b l e s a c
- n
t r
- l
p l a n e t
- c
- n
t r
- l
f
- r
w a r d
- i
n g d e v i c e s f r
- m
d i ff e r e n t h a r d w a r e a n d s
- f
t w a r e v e n d
- r
s .
Version Date Header Fields OF 1.0 Dec 2009 12 fields (Ethernet, TCP/IPv4) OF 1.1 Feb 2011 15 fields (MPLS, inter-table metadata) OF 1.2 Dec 2011 36 fields (ARP, ICMP, IPv6, etc.) OF 1.3 Jun 2012 40 fields OF 1.4 Oct 2013 41 fields
T a b l e 1 : F i e l d s r e c
- g
m u l t i p l e s t a g e s
- f
r u l e t a b l e s , t
- a
l l
- w
s w i t c h e s t
- e
x p
- s
e m
- r
e
- f
t h e i r c a p a b i l i t i e s t
- t
h e c
- n
t r
- l
l e r . T h e p r
- l
i f e r a t i
- n
- f
n e w h e a d e r fi e l d s s h
- w
s n
- s
i g n s
- f
s t
- p
p i n g . F
- r
e x a m p l e , d a t a
- c
e n t e r n e t w
- r
k
- p
e r a t
- r
s i n
- c
r e a s i n g l y w a n t t
- a
p p l y n e w f
- r
m s
- f
p a c k e t e n c a p s u l a
- t
i
- n
( e . g . , N V G R E , V X L A N , a n d S T T ) , f
- r
w h i c h t h e y r e
- s
- r
t t
- d
e p l
- y
i n g s
- f
t w a r e s w i t c h e s t h a t a r e e a s i e r t
- e
x t e n d w i t h n e w f u n c t i
- n
a l i t y . R a t h e r t h a n r e p e a t e d l y e x t e n d i n g t h e O p e n F l
- w
s p e c i fi c a t i
- n
, w e a r g u e t h a t f u t u r e s w i t c h e s s h
- u
l d s u p p
- r
t fl e x i b l e m e c h a n i s m s f
- r
p a r s i n g p a c k e t s a n d m a t c h i n g h e a d e r fi e l d s , a l l
- w
i n g c
- n
t r
- l
l e r a p p l i c a t i
- n
s t
- l
e v e r a g e t h e s e c a p a b i l i t i e s t h r
- u
g h a c
- m
m
- n
,
- p
e n i n t e r
- f
a c e ( i . e . , a n e w “ O p e n F l
- w
2 . ” A P I ) . S u c h a g e n e r a l , e x
- t
e n s i b l e a p p r
- a
c h w
- u
l d b e s i m p l e r , m
- r
e e l e g a n t , a n d m
- r
e f u t u r e
- p
r
- f
t h a n t
- d
a y ’ s O p e n F l
- w
1 . x s t a n d a r d . F i g u r e 1 : P 4 i s a l a n g u a g e t
- c
- n
fi R e c e n t c h i p d e s b e a c h
What does a P4 program look like?
L2 IPv4 ACL
MyEncap
IPv6
header_type ethernet_t { fields { dstAddr : 48; srcAddr : 48; etherType : 16; } } parser parse_ethernet { extract(ethernet); return select(latest.etherType) { 0x8100 : parse_vlan; 0x800 : parse_ipv4; 0x86DD : parse_ipv6; } }
TCP IPv4 IPv6 MyEncap Eth
header_type my_encap_t { fields { foo : 12; bar : 8; baz : 4; qux : 4; next_protocol : 4; } }
23
What does a P4 program look like?
L2 IPv4 ACL
MyEncap
IPv6
table ipv4_lpm { reads { ipv4.dstAddr : lpm; } actions { set_next_hop; drop; } } action set_next_hop(nhop_ipv4_addr, port) { modify_field(metadata.nhop_ipv4_addr, nhop_ipv4_addr); modify_field(standard_metadata.egress_port, port); add_to_field(ipv4.ttl, -1); } control ingress { apply(l2); apply(my_encap); if (valid(ipv4) { apply(ipv4_lpm); } else { apply(ipv6_lpm); } apply(acl); }
24
P4.org (http://p4.org)
§ Open-source community to nurture the language
§ Open-source software – Apache license § A common language: P416 § Support for various types of devices and targets
§ Enable a wealth of innovation
§ Diverse “apps” (including proprietary ones) running on commodity targets
§ With no barrier to entry
§ Free of membership fee, free of commitment, and simple licensing
25
So, what kinds of exciting new
- pportunities are arising?
26
The network should answer these questions
- 1. “Which path did my packet take?”
- 2. “Which rules did my packet follow?”
- 3. “How long did it queue at each switch?”
- 4. “Who did it share the queues with?”
PISA + P4 can answer all four questions for the first time. At full line rate. Without generating any additional packets!
1 2 3 4
Log, Analyze Replay
Add: SwitchID, Arrival Time, Queue Delay, Matched Rules, …
Original Packet
Visualize
In-band Network Telemetry (INT)
A read-only version of Tiny Packet Programs [sigcomm’14]
A quick demo of INT!
29
30
What does this mean to you?
- Improve your distributed apps’ performance with
telemetry data
- Ask the four key questions regarding your packets to
network admins or cloud providers
- Huge opportunities for Big-data processing and
machine-learning experts
- “Self-driving” network is not hyperbole
31
PISA: An architecture for high-speed programmable packet forwarding
32
event processing
What we have seen so far: Adding new networking features
- 1. New encapsulations and tunnels
- 2. New ways to tag packets for special treatment
- 3. New approaches to routing: e.g., source routing in data-
center networks
- 4. New approaches to congestion control
- 5. New ways to manipulate and forward packets: e.g. splitting
ticker symbols for high-frequency trading
33
What we have seen so far: World’s fastest middle boxes
- 1. Layer-4 load connection balancing at Tb/s
– Replace 100s of servers or 10s of dedicated appliances with one PISA switch – Track and maintain mappings for 5 ~ 10 million HTTP connections
- 2. Stateless firewall or DDoS detector
– Add/delete and track 100s of thousands of new connections per second – Include other stateless line-rate functions (e.g., TCP SYN authentication, sketches, or Bloomfilter-based whitelisting)
34
What we have seen so far: Offloading part of computing to network
- 1. DNS cache
- 2. Key-value cache [ACM SOSP’17]
- 3. Chain replication
- 4. Paxos [ACM CCR’16] and RAFT
- 5. Parameter service for DNN training
35
Example: NetCache
Clients
Key-Value Cache Query Statistics High-performance Storage Servers
Key-Value Storage Rack
Controller L2/L3 Routing ToR Switch Data plane
- Non-goal
– Maximize the cache hit rate
- Goal
– Balance the workloads of backend servers by serving
- nly O(NlogN) hot items -- N is the number of
backend servers – Make the “fast, small-cache” theory viable for modern in-memory KV servers [Fan et. al., SOCC’11]
- Data plane
– Unmodified routing – Key-value cache built with on-chip SRAM – Query statistics to detect hot items
- Control plane
– Update cache with hot items to handle dynamic workloads
The “boring life” of a NetCache switch
32 64 96 128 9alue 6ize (Byte) 0.0 0.5 1.0 1.5 2.0 2.5 ThroughSut (B436) 16. 32. 48. 64. CacKe 6ize 0.0 0.5 1.0 1.5 2.0 2.5 TKrougKSut (B436)
(b) Throughput vs. cache size.
One can further increase the value sizes with more stages, recirculation, or mirroring.
Yes, it’s Billion Queries Per Sec, not a typo J
And its “not so boring” benefits
NetCache provides 3-10x throughput improvements. Throughput of a key-value storage rack with
- ne Tofino switch and 128 storage servers.
uQiforP ziSf-0.9 ziSf-0.95 ziSf-0.99 WorNloDd DisWribuWioQ 0.0 0.5 1.0 1.5 2.0 ThroughSuW (BQPS)
1oCDche 1eWCDche(servers) 1eWCDche(cDche)
NetCache is a key-value store that leverages
&
Billions of queries/sec a few usec latency even under workloads.
&
highly-skewed rapidly-changing in-network caching to achieve
Summing it up …
40
Why data-plane programming?
- 1. New features: Realize your beautiful ideas very quickly
- 2. Reduce complexity: Remove unnecessary features and tables
- 3. Efficient use of H/W resources: Achieve biggest bang for buck
- 4. Greater visibility: New diagnostics, telemetry, OAM, etc.
- 5. Modularity: Compose forwarding behavior from libraries
- 6. Portability: Specify forwarding behavior once; compile to many devices
- 7. Own your own ideas: No need to share your ideas with others
“Protocols are being lifted off chips and into software”
– Ben Horowitz
41
- PISA and P4: The first attempt to define a machine
architecture and programming models for networking in a disciplined way
- Network is becoming yet another programmable platform
- It’s fun to figure out the best workloads for this new
machine architecture
42
My observations
Want to find more resources or follow up?
- Visit http://p4.org and http://github.com/p4lang
– P4 language spec – P4 dev tools and sample programs – P4 tutorials
- Join P4 workshops and P4 developers’ days
- Participate in P4 working group activities
– Language, target architecture, runtime API, applications
- Need more expertise across various fields in computer science
– To enhance PISA, P4, dev tools (e.g., for formal verification, equivalence check, and many more …)
43
Thanks. Let’s develop your beautiful ideas in P4!
44