Programming The Network Data Plane Changhoon Kim Beautiful ideas: - - PowerPoint PPT Presentation

programming the network data plane
SMART_READER_LITE
LIVE PREVIEW

Programming The Network Data Plane Changhoon Kim Beautiful ideas: - - PowerPoint PPT Presentation

Programming The Network Data Plane Changhoon Kim Beautiful ideas: What if you could Realize a small, but super-fast DNS cache Perform TCP SYN authentication for billions of SYNs per sec Build a replicated key-value store ensuring


slide-1
SLIDE 1

Programming The Network Data Plane

Changhoon Kim

slide-2
SLIDE 2

Beautiful ideas: What if you could …

  • Realize a small, but super-fast DNS cache
  • Perform TCP SYN authentication for billions of SYNs per sec
  • Build a replicated key-value store ensuring RW ops in a few usecs
  • Improve your consensus service performance by ~100x
  • Boost your Memcached cluster’s throughput by ~10x
  • Speed up your DNN training dramatically by realizing parameter

servers

2

… using switches in your network?

slide-3
SLIDE 3

You couldn’t do any of those so far because …

  • No DIY – must work with vendors at feature level
  • Excruciatingly complicated and involved process to build

consensus and pressure for features

  • Painfully long and unpredictable lead time
  • To use new features, you must get new switches
  • What you finally get != what you asked for

3

slide-4
SLIDE 4

This is very unnatural to developers

  • Because you all know how to realize your own ideas by

“programming” CPUs

– Programs used in every phase (implement, test, and deploy) – Extremely fast iteration and differentiation – You own your own ideas – A sustainable ecosystem where all participants benefit

4

Can we replicate this healthy, sustainable ecosystem for networking?

slide-5
SLIDE 5

Reality: Packet forwarding speeds

0.1 1 10 100 1000 10000 100000 1990 1995 2000 2005 2010 2015 2020 Switch Chip CPU

5

Gb/s

(per chip)

6.4Tb/s

slide-6
SLIDE 6

Reality: Packet forwarding speeds

0.1 1 10 100 1000 10000 100000 1990 1995 2000 2005 2010 2015 2020 Switch Chip CPU

6

80x

Gb/s

(per chip)

6.4Tb/s

slide-7
SLIDE 7

7

What does a typical switch look like?

Chip Driver Run-time API Protocol Daemons (BGP, OSPF, etc.) Other Mgmt Apps

Data plane Control plane

Switch OS (Linux variant) PCIe …

L2 Forwarding Table L3 Routing Table ACL Table

A switch is just a Linux box with a high-speed switching chip

packets packets Just S/W -- You can freely change this Fixed-function H/W

  • - There’s nothing

you can change here

slide-8
SLIDE 8

Networking systems have been built “bottoms-up”

Switch OS

“This is roughly how I process packets …” Fixed-function switch in English

API

slide-9
SLIDE 9

Turning the tables “top-down”

Switch OS

“This is precisely how you must process packets” Programmable Switch in P4

API

slide-10
SLIDE 10

“Programmable switches are 10 -100x slower

than fixed-function switches. They cost more and consume more power.”

Conventional wisdom in networking

slide-11
SLIDE 11

Evidence: Tofino 6.5Tb/s switch (arrived Dec 2016)

The world’s fastest and most programmable switch.

No power, cost, or power penalty compared to fixed-function switches. An incarnation of PISA (Protocol Independent Switch Architecture)

slide-12
SLIDE 12

Domain-specific processors

CPU

Computers Java Compiler

GPU

Graphics OpenCL Compiler

DSP

Signal Processing Matlab Compiler Machine Learning

?

TPU

TensorFlow

Compiler Networking

?

Language Compiler

>>>

slide-13
SLIDE 13

CPU

Computers Java Compiler

GPU

Graphics OpenCL Compiler

DSP

Signal Processing Matlab Compiler Machine Learning

?

TPU

TensorFlow

Compiler

PISA

Networking P4 Compiler

>>>

Domain-specific processors

slide-14
SLIDE 14

PISA: An architecture for high-speed programmable packet forwarding

14

slide-15
SLIDE 15

15

Programmable Parser

Match

Memory

Action

ALU

PISA: Protocol Independent Switch Architecture

slide-16
SLIDE 16

16

Programmable Parser

PISA: Protocol Independent Switch Architecture

Ingress Egress Buffer

slide-17
SLIDE 17

Buffer M M

17

Programmable Parser

PISA: Protocol Independent Switch Architecture

Match Logic

(Mix of SRAM and TCAM for lookup tables, counters, meters, generic hash tables)

Action Logic

(ALUs for standard boolean and arithmetic operations, header modification operations, hashing operations, etc.) Recirculation Programmable Packet Generator CPU (Control plane) A

A

Ingress match-action stages (pre-switching) Egress match-action stages (post-switching)

Generalization of RMT [sigcomm’13]

slide-18
SLIDE 18

Why we call it protocol-independent packet processing

18

slide-19
SLIDE 19

Logical Data-plane View (your P4 program) Switch Pipeline

Device does not understand any protocols until it gets programmed

Queues Programmable Parser

Fixed Action Match Table Match Table Match Table Match Table L2 IPv4 IPv6 ACL Action ALUs Action ALUs Action ALUs Action ALUs

packet packet packet packet

CLK

19

slide-20
SLIDE 20

Match Table Action ALUs

Mapping logical data-plane design to physical resources

Queues

Match Table Match Table Match Table L2 Table IPv4 Table IPv6 Table ACL Table Action ALUs Action ALUs Action ALUs L2 IPv4 IPv6 ACL

Logical Data-plane View (your P4 program) Switch Pipeline

L2 IPv6 ACL IPv4

L2 Action Macro v4 Action Macro v6 Action Macro ACL Action Macro

Programmable Parser

CLK

20

slide-21
SLIDE 21

Re-program in the field

L2 Table IPv4 Table ACL Table IPv6 Table

My Encap

L2 IPv4 IPv6 ACL

MyEncap

L2 Action Macro v4 Action Macro ACL Action Macro Action

MyEncap

v6 Action Macro

IPv4

Action

IPv4

Action

IPv6

Action

IPv6

Programmable Parser

CLK

Logical Data-plane View (your P4 program) Switch Pipeline Queues

21

slide-22
SLIDE 22

P4: Programming Protocol-Independent Packet Processors

Pat Bosshart†, Dan Daly*, Glen Gibb†, Martin Izzard†, Nick McKeown‡, Jennifer Rexford**, Cole Schlesinger**, Dan Talayco†, Amin Vahdat¶, George Varghese§, David Walker**

†Barefoot Networks *Intel ‡Stanford University **Princeton University ¶Google §Microsoft Research

ABSTRACT

P 4 i s a h i g h

  • l

e v e l l a n g u a g e f

  • r

p r

  • g

r a m m i n g p r

  • t
  • c
  • l
  • i

n d e

  • p

e n d e n t p a c k e t p r

  • c

e s s

  • r

s . P 4 w

  • r

k s i n c

  • n

j u n c t i

  • n

w i t h S D N c

  • n

t r

  • l

p r

  • t
  • c
  • l

s l i k e O p e n F l

  • w

. I n i t s c u r r e n t f

  • r

m , O p e n F l

  • w

e x p l i c i t l y s p e c i fi e s p r

  • t
  • c
  • l

h e a d e r s

  • n

w h i c h i t

  • p

e r a t e s . T h i s s e t h a s g r

  • w

n f r

  • m

1 2 t

  • 4

1 fi e l d s i n a f e w y e a r s , i n c r e a s i n g t h e c

  • m

p l e x i t y

  • f

t h e s p e c i fi c a t i

  • n

w h i l e s t i l l n

  • t

p r

  • v

i d i n g t h e fl e x i b i l i t y t

  • a

d d n e w h e a d e r s . I n t h i s p a p e r w e p r

  • p
  • s

e P 4 a s a s t r a w m a n p r

  • p
  • s

a l f

  • r

h

  • w

O p e n

  • F

l

  • w

s h

  • u

l d e v

  • l

v e i n t h e f u t u r e . W e h a v e t h r e e g

  • a

l s : ( 1 ) R e c

  • n

fi g u r a b i l i t y i n t h e fi e l d : P r

  • g

r a m m e r s s h

  • u

l d b e a b l e t

  • c

h a n g e t h e w a y s w i t c h e s p r

  • c

e s s p a c k e t s

  • n

c e t h e y a r e d e p l

  • y

e d . ( 2 ) P r

  • t
  • c
  • l

i n d e p e n d e n c e : S w i t c h e s s h

  • u

l d n

  • t

b e t i e d t

  • a

n y s p e c i fi c n e t w

  • r

k p r

  • t
  • c
  • l

s . ( 3 ) T a r g e t i n d e

  • p

e n d e n c e : P r

  • g

r a m m e r s s h

  • u

l d b e a b l e t

  • d

e s c r i b e p a c k e t

  • p

r

  • c

e s s i n g f u n c t i

  • n

a l i t y i n d e p e n d e n t l y

  • f

t h e s p e c i fi c s

  • f

t h e u n d e r l y i n g h a r d w a r e . A s a n e x a m p l e , w e d e s c r i b e h

  • w

t

  • u

s e P 4 t

  • c
  • n

fi g u r e a s w i t c h t

  • a

d d a n e w h i e r a r c h i c a l l a b e l .

1. INTRODUCTION

S

  • f

t w a r e

  • D

e fi n e d N e t w

  • r

k i n g ( S D N ) g i v e s

  • p

e r a t

  • r

s p r

  • g

r a m m a t i c c

  • n

t r

  • l
  • v

e r t h e i r n e t w

  • r

k s . I n S D N , t h e c

  • n
  • t

r

  • l

p l a n e i s p h y s i c a l l y s e p a r a t e f r

  • m

t h e f

  • r

w a r d i n g p l a n e , a n d

  • n

e c

  • n

t r

  • l

p l a n e c

  • n

t r

  • l

s m u l t i p l e f

  • r

w a r d i n g d e v i c e s . W h i l e f

  • r

w a r d i n g d e v i c e s c

  • u

l d b e p r

  • g

r a m m e d i n m a n y w a y s , h a v i n g a c

  • m

m

  • n

,

  • p

e n , v e n d

  • r
  • a

g n

  • s

t i c i n t e r f a c e ( l i k e O p e n F l

  • w

) e n a b l e s a c

  • n

t r

  • l

p l a n e t

  • c
  • n

t r

  • l

f

  • r

w a r d

  • i

n g d e v i c e s f r

  • m

d i ff e r e n t h a r d w a r e a n d s

  • f

t w a r e v e n d

  • r

s .

Version Date Header Fields OF 1.0 Dec 2009 12 fields (Ethernet, TCP/IPv4) OF 1.1 Feb 2011 15 fields (MPLS, inter-table metadata) OF 1.2 Dec 2011 36 fields (ARP, ICMP, IPv6, etc.) OF 1.3 Jun 2012 40 fields OF 1.4 Oct 2013 41 fields

T a b l e 1 : F i e l d s r e c

  • g

m u l t i p l e s t a g e s

  • f

r u l e t a b l e s , t

  • a

l l

  • w

s w i t c h e s t

  • e

x p

  • s

e m

  • r

e

  • f

t h e i r c a p a b i l i t i e s t

  • t

h e c

  • n

t r

  • l

l e r . T h e p r

  • l

i f e r a t i

  • n
  • f

n e w h e a d e r fi e l d s s h

  • w

s n

  • s

i g n s

  • f

s t

  • p

p i n g . F

  • r

e x a m p l e , d a t a

  • c

e n t e r n e t w

  • r

k

  • p

e r a t

  • r

s i n

  • c

r e a s i n g l y w a n t t

  • a

p p l y n e w f

  • r

m s

  • f

p a c k e t e n c a p s u l a

  • t

i

  • n

( e . g . , N V G R E , V X L A N , a n d S T T ) , f

  • r

w h i c h t h e y r e

  • s
  • r

t t

  • d

e p l

  • y

i n g s

  • f

t w a r e s w i t c h e s t h a t a r e e a s i e r t

  • e

x t e n d w i t h n e w f u n c t i

  • n

a l i t y . R a t h e r t h a n r e p e a t e d l y e x t e n d i n g t h e O p e n F l

  • w

s p e c i fi c a t i

  • n

, w e a r g u e t h a t f u t u r e s w i t c h e s s h

  • u

l d s u p p

  • r

t fl e x i b l e m e c h a n i s m s f

  • r

p a r s i n g p a c k e t s a n d m a t c h i n g h e a d e r fi e l d s , a l l

  • w

i n g c

  • n

t r

  • l

l e r a p p l i c a t i

  • n

s t

  • l

e v e r a g e t h e s e c a p a b i l i t i e s t h r

  • u

g h a c

  • m

m

  • n

,

  • p

e n i n t e r

  • f

a c e ( i . e . , a n e w “ O p e n F l

  • w

2 . ” A P I ) . S u c h a g e n e r a l , e x

  • t

e n s i b l e a p p r

  • a

c h w

  • u

l d b e s i m p l e r , m

  • r

e e l e g a n t , a n d m

  • r

e f u t u r e

  • p

r

  • f

t h a n t

  • d

a y ’ s O p e n F l

  • w

1 . x s t a n d a r d . F i g u r e 1 : P 4 i s a l a n g u a g e t

  • c
  • n

fi R e c e n t c h i p d e s b e a c h

slide-23
SLIDE 23

What does a P4 program look like?

L2 IPv4 ACL

MyEncap

IPv6

header_type ethernet_t { fields { dstAddr : 48; srcAddr : 48; etherType : 16; } } parser parse_ethernet { extract(ethernet); return select(latest.etherType) { 0x8100 : parse_vlan; 0x800 : parse_ipv4; 0x86DD : parse_ipv6; } }

TCP IPv4 IPv6 MyEncap Eth

header_type my_encap_t { fields { foo : 12; bar : 8; baz : 4; qux : 4; next_protocol : 4; } }

23

slide-24
SLIDE 24

What does a P4 program look like?

L2 IPv4 ACL

MyEncap

IPv6

table ipv4_lpm { reads { ipv4.dstAddr : lpm; } actions { set_next_hop; drop; } } action set_next_hop(nhop_ipv4_addr, port) { modify_field(metadata.nhop_ipv4_addr, nhop_ipv4_addr); modify_field(standard_metadata.egress_port, port); add_to_field(ipv4.ttl, -1); } control ingress { apply(l2); apply(my_encap); if (valid(ipv4) { apply(ipv4_lpm); } else { apply(ipv6_lpm); } apply(acl); }

24

slide-25
SLIDE 25

P4.org (http://p4.org)

§ Open-source community to nurture the language

§ Open-source software – Apache license § A common language: P416 § Support for various types of devices and targets

§ Enable a wealth of innovation

§ Diverse “apps” (including proprietary ones) running on commodity targets

§ With no barrier to entry

§ Free of membership fee, free of commitment, and simple licensing

25

slide-26
SLIDE 26

So, what kinds of exciting new

  • pportunities are arising?

26

slide-27
SLIDE 27

The network should answer these questions

  • 1. “Which path did my packet take?”
  • 2. “Which rules did my packet follow?”
  • 3. “How long did it queue at each switch?”
  • 4. “Who did it share the queues with?”

PISA + P4 can answer all four questions for the first time. At full line rate. Without generating any additional packets!

1 2 3 4

slide-28
SLIDE 28

Log, Analyze Replay

Add: SwitchID, Arrival Time, Queue Delay, Matched Rules, …

Original Packet

Visualize

In-band Network Telemetry (INT)

A read-only version of Tiny Packet Programs [sigcomm’14]

slide-29
SLIDE 29

A quick demo of INT!

29

slide-30
SLIDE 30

30

slide-31
SLIDE 31

What does this mean to you?

  • Improve your distributed apps’ performance with

telemetry data

  • Ask the four key questions regarding your packets to

network admins or cloud providers

  • Huge opportunities for Big-data processing and

machine-learning experts

  • “Self-driving” network is not hyperbole

31

slide-32
SLIDE 32

PISA: An architecture for high-speed programmable packet forwarding

32

event processing

slide-33
SLIDE 33

What we have seen so far: Adding new networking features

  • 1. New encapsulations and tunnels
  • 2. New ways to tag packets for special treatment
  • 3. New approaches to routing: e.g., source routing in data-

center networks

  • 4. New approaches to congestion control
  • 5. New ways to manipulate and forward packets: e.g. splitting

ticker symbols for high-frequency trading

33

slide-34
SLIDE 34

What we have seen so far: World’s fastest middle boxes

  • 1. Layer-4 load connection balancing at Tb/s

– Replace 100s of servers or 10s of dedicated appliances with one PISA switch – Track and maintain mappings for 5 ~ 10 million HTTP connections

  • 2. Stateless firewall or DDoS detector

– Add/delete and track 100s of thousands of new connections per second – Include other stateless line-rate functions (e.g., TCP SYN authentication, sketches, or Bloomfilter-based whitelisting)

34

slide-35
SLIDE 35

What we have seen so far: Offloading part of computing to network

  • 1. DNS cache
  • 2. Key-value cache [ACM SOSP’17]
  • 3. Chain replication
  • 4. Paxos [ACM CCR’16] and RAFT
  • 5. Parameter service for DNN training

35

slide-36
SLIDE 36

Example: NetCache

Clients

Key-Value Cache Query Statistics High-performance Storage Servers

Key-Value Storage Rack

Controller L2/L3 Routing ToR Switch Data plane

  • Non-goal

– Maximize the cache hit rate

  • Goal

– Balance the workloads of backend servers by serving

  • nly O(NlogN) hot items -- N is the number of

backend servers – Make the “fast, small-cache” theory viable for modern in-memory KV servers [Fan et. al., SOCC’11]

  • Data plane

– Unmodified routing – Key-value cache built with on-chip SRAM – Query statistics to detect hot items

  • Control plane

– Update cache with hot items to handle dynamic workloads

slide-37
SLIDE 37

The “boring life” of a NetCache switch

32 64 96 128 9alue 6ize (Byte) 0.0 0.5 1.0 1.5 2.0 2.5 ThroughSut (B436) 16. 32. 48. 64. CacKe 6ize 0.0 0.5 1.0 1.5 2.0 2.5 TKrougKSut (B436)

(b) Throughput vs. cache size.

One can further increase the value sizes with more stages, recirculation, or mirroring.

Yes, it’s Billion Queries Per Sec, not a typo J

slide-38
SLIDE 38

And its “not so boring” benefits

NetCache provides 3-10x throughput improvements. Throughput of a key-value storage rack with

  • ne Tofino switch and 128 storage servers.

uQiforP ziSf-0.9 ziSf-0.95 ziSf-0.99 WorNloDd DisWribuWioQ 0.0 0.5 1.0 1.5 2.0 ThroughSuW (BQPS)

1oCDche 1eWCDche(servers) 1eWCDche(cDche)

slide-39
SLIDE 39

NetCache is a key-value store that leverages

&

Billions of queries/sec a few usec latency even under workloads.

&

highly-skewed rapidly-changing in-network caching to achieve

slide-40
SLIDE 40

Summing it up …

40

slide-41
SLIDE 41

Why data-plane programming?

  • 1. New features: Realize your beautiful ideas very quickly
  • 2. Reduce complexity: Remove unnecessary features and tables
  • 3. Efficient use of H/W resources: Achieve biggest bang for buck
  • 4. Greater visibility: New diagnostics, telemetry, OAM, etc.
  • 5. Modularity: Compose forwarding behavior from libraries
  • 6. Portability: Specify forwarding behavior once; compile to many devices
  • 7. Own your own ideas: No need to share your ideas with others

“Protocols are being lifted off chips and into software”

– Ben Horowitz

41

slide-42
SLIDE 42
  • PISA and P4: The first attempt to define a machine

architecture and programming models for networking in a disciplined way

  • Network is becoming yet another programmable platform
  • It’s fun to figure out the best workloads for this new

machine architecture

42

My observations

slide-43
SLIDE 43

Want to find more resources or follow up?

  • Visit http://p4.org and http://github.com/p4lang

– P4 language spec – P4 dev tools and sample programs – P4 tutorials

  • Join P4 workshops and P4 developers’ days
  • Participate in P4 working group activities

– Language, target architecture, runtime API, applications

  • Need more expertise across various fields in computer science

– To enhance PISA, P4, dev tools (e.g., for formal verification, equivalence check, and many more …)

43

slide-44
SLIDE 44

Thanks. Let’s develop your beautiful ideas in P4!

44