UDP Encapsulation in Linux netdev0.1 Conference February 16, 2015 - - PowerPoint PPT Presentation

udp encapsulation in linux netdev0 1 conference february
SMART_READER_LITE
LIVE PREVIEW

UDP Encapsulation in Linux netdev0.1 Conference February 16, 2015 - - PowerPoint PPT Presentation

UDP Encapsulation in Linux netdev0.1 Conference February 16, 2015 Tom Herbert <therbert@google.com> Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada Topics UDP encapsulation Common offloads Foo over UDP (FOU)


slide-1
SLIDE 1

UDP Encapsulation in Linux netdev0.1 Conference February 16, 2015

Tom Herbert <therbert@google.com>

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-2
SLIDE 2

Topics

  • UDP encapsulation
  • Common offloads
  • Foo over UDP (FOU)
  • Generic UDP Encapsulation (GUE)

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-3
SLIDE 3

Basic idea of UDP encap

  • Put network packets into UDP payload
  • Two general methods

○ No encapsulation header: protocol of packet is inferred from port number ○ Encapsulation header: extra header between UDP header and packet. Protocol and other data can be

  • there. For example:

Data Data TCP GUE IP ETH IP UDP Data TCP IP Data TCP GUE IP UDP

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-4
SLIDE 4

VM encap example

Host kernel Encapsulator IP NIC driver Guest kernel Application Encapsulation Decapsulation 4 1 2 3 4 Host kernel Decapsulator IP NIC driver Guest kernel Application 1 2 3 4

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-5
SLIDE 5

UDP encap popularity

  • UDP works with existing HW infrastructure

○ RSS in NICs, ECMP in switches ○ Checksum offload

  • Used in nearly all encap, NV data protocols

○ VXLAN, LISP, MPLS, GUE, Geneve, NSH, L2TP

  • Likelihood UDP based encapsulation

becomes ubiquitous

○ In time most packets in DC could be UDP!

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-6
SLIDE 6
  • Load balancing
  • Checksum offload
  • Segmentation offload

Offloads

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-7
SLIDE 7
  • For ECMP, RSS, LAG port selection
  • Probably all switches can 5-tuple over

UDP/IP packets

  • Solution: use source port to represent hash
  • f inner flow

○ ~14 bits of entropy ○ udp_src_flow_port function

Load balancing

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-8
SLIDE 8
  • NETIF_HW_CSUM

○ Initialize checksum to pseudo header csum ○ Input to device start and offset ○ HW checksums from start to end of packet and writes result at offset

  • NETIF_IP_CSUM

○ HW can only checksum with certain protocol hdrs ○ Typically UDP/IP and TCP/IP ○ HW handle pseudo hdr csum also

TX Checksum offload

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-9
SLIDE 9
  • CHECKSUM_COMPLETE

○ HW returns checksum calculation across whole packet ○ Host uses returned value to validate checksum(s) in the packet

  • CHECKSUM_UNNECESSARY

○ HW verfies and returns “checksum okay” ○ Protocol specific, HW needs to parse packet ○ csum_level allows HW to checksum within encapsulation, multiple checksums

RX Checksum offload

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-10
SLIDE 10
  • Need to offload inner checksum like TCP
  • UDP also has it’s own checksum, this makes

things interesting!

Checksum offload for encapsulation

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-11
SLIDE 11
  • Want set to zero for “performance”

(particularly switch vendors), but...

  • UDP checksum is required for IPv6, and…
  • UDP checksum covers more of packet than

inner checksum, but...

  • RFC6935, RFC6936, and a lot more

requirements in encapsulation protocol drafts to allow it, but…

  • UDP checksum is actually a good idea for

both v4 and v6 when you’re using Linux hosts to do encapsulation, let me explain...

The MIGHTHY UDP Checksum for Encaps

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-12
SLIDE 12
  • Probably every deployed NIC supports

simple UDP checksum for TX and RX

  • Only new NICs support offload of

encapsulated checksums

  • Solution: Enable UDP checksum for encap

and use it to offload inner checksums

○ Receive: checksum-unnecessary conversion ○ Transmit: remote checksum offload

Leveraging UDP checksum offload

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-13
SLIDE 13
  • Device returns “checksum unnecessary” for

non-zero outer UDP checksum

  • Complete checksum of packet starting from

the UDP header is ~pseudo_hdr_csum

  • So convert checksum unnecessary to

checksum complete

  • Inner checksum(s) verified using checksum

complete

  • No checksum computation on host!

Checksum unnecessary conversion

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-14
SLIDE 14
  • Defer TX checksum offload to remote
  • Encapsulation header with start and offset

data referring to inner checksum

  • Offload outer UDP checksum and send
  • At receive

○ Do what device does: determine checksum from start to end of packet and write to offset ○ Aleady have complete checksum so we can easily find this ○ Write checksum into packet, validate like normal

  • No checksum calculation in host

Remote checksum offload

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-15
SLIDE 15

Segmentation offload

  • Stack operates on bigger than MTU sized

packets

  • Offloads in receive and transmit

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-16
SLIDE 16

Transmit segmentation offload

  • Split big TCP packet into small ones
  • GSO (stack), TSO (HW)
  • For each created packet

○ Copy headers from big one ○ Adjust lengths, checksums, sequence number that must be set per packet

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-17
SLIDE 17

GSO for UDP encapsulation

  • UDP GSO function calls

skb_udp_tunnel_segment

  • Call GSO segment for next layer:

gso_inner_segment

  • Adjust UDP length and checksum per packet
  • For encapsulation header, just copy those

bytes*

*Assuming encapsulation header does not have fields that must be set per packet

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-18
SLIDE 18

Receive segmentation offload

  • Build large TCP packet from small ones
  • GRO operation is to match packets to same

flow for coalesing

  • GRO (stack), LRO (HW)

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-19
SLIDE 19

GRO for UDP encapsulation

  • UDP GRO receive path (udp_gro_receive)
  • Encapsulation specific GRO functions

○ Call GRO function per port ○ Facility to register offloads per port ○ Call GRO receive for next protocol

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-20
SLIDE 20

FOU and GUE

FOU and GUE encapsulating IP

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-21
SLIDE 21

Foo over UDP

  • Packets of IP protocol over UDP
  • Destination port maps to IP protocol

○ e.g. IP (IPIP), IPv6, (sit), GRE, ESP, etc ○ Example: IPIP on port 5555

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-22
SLIDE 22

FOU support

  • Logically, a header inserted to facilitate

transport

  • fou.c implements RX.

○ encap_rcv in socket ○ Remove UDP and reinject IP packet as protocol associated with port

  • Ip tunnel implements FOU for IPIP, SIT,

GRE

○ Insert UDP header between IP and payload ○ Source port from flow_hash

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-23
SLIDE 23

FOU example

  • Set up receive

ip fou add port 5555 ipproto 4

  • Set up transmit

ip link add name tun1 type ipip \

remote 192.168.1.1 \ local 192.168.1.2 \ ttl 225 \ encap fou \ encap-sport auto \ encap-dport 5555

  • fou.c implements RX.

○ encap_rcv in socket ○ Remove UDP and reinject IP packet as protocol associated with port

  • Ip tunnel implements FOU for IPIP, SIT,

GRE

○ Insert UDP header between IP and payload ○ Source port from flow_hash

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-24
SLIDE 24

IP in FOU transmit

Data TCP IP

Start with a plain TCP/IP packet sent on tun1

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-25
SLIDE 25

IP in FOU transmit

Data TCP IP

Logically prepend IP header

IP

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-26
SLIDE 26

IP in FOU transmit

Data TCP IP

This is IPIP encapsulation

IP

IP protocol is 4 for IPIP

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-27
SLIDE 27

IP in FOU transmit

Data TCP IP IP UDP

UDP port set to hassh value for inner IP/TCP headers UDP destination port set to 5555 for IP/UDP

Insert UDP header

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-28
SLIDE 28

IP in FOU transmit

Data TCP IP

IP packet with encapsulation

IP UDP

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-29
SLIDE 29

IP in FOU transmit

Data TCP IP

Add Ethernet header and send

IP UDP ETH

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-30
SLIDE 30

IP in FOU receive

Data TCP IP

Receiver processes UDP packet based on destination port

IP UDP

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-31
SLIDE 31

IP in FOU receive

Data TCP IP

Remove UDP header

IP UDP

Adjust transport header

  • ffset in sk_buff

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-32
SLIDE 32

IP in FOU receive

Data TCP IP

Now have original IPIP packet. Reinject this into kernel, next protocol to prcess is 4

IP

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-33
SLIDE 33

Generic UDP encapsulation (GUE)

  • Extensible and generic encapsulation proto
  • Encapsulation header for carrying packets of

IP protocol

  • Type field, header length, 8 bit IP protocol
  • 16 bit flags and optional fields indicated by
  • them. More can be defined in extension
  • Private/extension flag

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-34
SLIDE 34

GUE headers

UDP and GUE headers

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-35
SLIDE 35

GRE/GUE example

  • Set up receiver

ip fou add port 7777 gue

  • Set up transmit

ip link add name tun1 type ipip \

remote 192.168.1.1 \ local 192.168.1.2 \ ttl 225 \ encap gue \ encap-sport auto \ encap-dport 7777 \ encap-udp-csum \ encap-remcsum

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-36
SLIDE 36

GRE in GUE transmit

IPv4 packet

Application sends packet on tun1

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-37
SLIDE 37

GRE in GUE transmit

IPv4 packet

Logically prepend IP header for GRE/IP tunneling

IP GRE

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-38
SLIDE 38

GRE in GUE transmit

IPv4 packet

Insert UDP/GUE headers

GRE UDP GUE IP

Next protocol is 47 for GRE UDP destination port set to 7777 for GUE

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-39
SLIDE 39

GRE in GUE transmit

IPv4 packet

Insert UDP/GUE headers

GRE UDP GUE IP

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-40
SLIDE 40

GRE in GUE transmit

IPv4 packet

Add Ethernet and IP headers and send

GRE UDP GUE IP ETH

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-41
SLIDE 41

GRE in GUE receive

IPv4 packet

Process packet based on UDP port (GUE port)

GRE UDP GUE IP

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-42
SLIDE 42

GRE in GUE receive

IPv4 packet

Remove UDP/GUE headers

GRE UDP GUE IP

Adjust transport header offset in sk_buff

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-43
SLIDE 43

GRE in GUE receive

IPv4 packet

Now have original GRE/IP packet. Reinject this into kernel, next protocol to prcess is 47 (GRE)

GRE IP

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-44
SLIDE 44

Thanks, and looking forward

  • Good support for UDP encapsulation is the

result of a broad community effort

  • Still a lot of intersting work to do in security,

control, and performance

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada