FBTFTP Facebooks Python3 open-source framework to build dynamic tftp - - PowerPoint PPT Presentation

fbtftp
SMART_READER_LITE
LIVE PREVIEW

FBTFTP Facebooks Python3 open-source framework to build dynamic tftp - - PowerPoint PPT Presentation

FBTFTP Facebooks Python3 open-source framework to build dynamic tftp servers Angelo Failla Production Engineer Cluster infrastructure team Facebook Ireland Who am I? A Production Engineer Similar to SRE / DevOps Based


slide-1
SLIDE 1

FBTFTP

Angelo Failla

Production Engineer
 Cluster infrastructure team
 Facebook Ireland

Facebook’s Python3 open-source framework to build dynamic tftp servers

slide-2
SLIDE 2
  • A Production Engineer
  • Similar to SRE / DevOps

  • Based in Facebook Ireland, Dublin
  • Since 2011

  • Cluster Infrastructure team member
  • Owns data center core services
  • Owns E2E automation for bare metal

provisioning and cluster management.

Who am I?

slide-3
SLIDE 3

“There is no cloud, just other people’s computers…”

  • a (very wise) person on the interwebz

“… and someone’s got to provision them.”

  • Angelo
slide-4
SLIDE 4
slide-5
SLIDE 5
slide-6
SLIDE 6
slide-7
SLIDE 7

POPs: Point of Presence Data center locations POPs locations are fictional

slide-8
SLIDE 8

HANDS FREE PROVISIONING:

slide-9
SLIDE 9

kernel OS initrd BIOS UEFI firmware bootloader v6/v4 DHCP TFTP bootloader config kickstart anaconda RPM's location buildcontrol cyborg server type vendor model OOB partitioning schemas 3rd party chef tier HTTP repos mysql inventory sys

slide-10
SLIDE 10

kernel OS initrd BIOS UEFI firmware bootloader v6/v4 DHCP TFTP bootloader config kickstart anaconda RPM's location buildcontrol cyborg server type vendor model OOB partitioning schemas 3rd party chef tier HTTP repos mysql inventory sys

slide-11
SLIDE 11

TFTP

slide-12
SLIDE 12

It’s common in Data Center/ISP environments Simple protocol specifications Easy to implement UDP based -> produces small code footprint Fits in small boot ROMs Embedded devices and network equipment Traditionally used for netboot (with DHCPv[46])

slide-13
SLIDE 13

DHCPv[46] - KEA TFTP NBP NETBOOT ANACONDA CHEF

REBOOT PROVISIONED POWER ON

  • fetches config via tftp
  • fetches kernel/initrd


(via http or tftp)

  • provides NBPs
  • provides config files for NBPs
  • provides kernel/initrd
  • provides network config
  • provides path for NBPs binaries

Provisioning phases

slide-14
SLIDE 14

30+ years old protocol

me, ~1982 circa

slide-15
SLIDE 15

Protocol in a nutshell (RRQ)

CLIENT

RRQ X 69 X Y DAT 1 ACK 1 X Y

SERVER

X Y DAT N ACK N X Y

slide-16
SLIDE 16

Latency: ~150ms

File size Block Size Latency Time to download 80 MB 512 B 150ms 12.5 hours 80 MB 1400 B 150ms 4.5 hours 80 MB 512 B/ 1400 B 1ms <1 minute

POP DC

CLIENT RR X 69 X Y DAT ACK X Y SERVER

POPs locations are fictional

slide-17
SLIDE 17

A look in the past ~2014 (and its problems)

HW LB in.tftpd
 (active) in.tftpd
 (passive) Servers Cluster
 VIP Automation

REPO

rsync 7GB Write config

  • Physical load balancers
  • Waste of resources
  • Automation needs to know

which server is active

  • No stats
  • TFTP is a bad protocol in

high latency environments

  • Too many moving parts
slide-18
SLIDE 18

How did we solve those problems?

slide-19
SLIDE 19
  • Supports only RRQ (fetch operation)
  • Main TFTP spec[1], Option Extension[2], Block size
  • ption[3], Timeout Interval and Transfer Size Options[4].
  • Extensible:
  • Define your own logic
  • Push your own statistics (per session or global)

We built FBTFTP…

…A python3 framework to build dynamic TFTP servers

[1] RFC1350, [2] RFC2347, [3] RFC2348, [4] RFC2349

slide-20
SLIDE 20

BaseServer BaseHandler Client

transfer session fork()

server callback session callback

get_handler()

ResponseData

get_response_data() RRQ

Monitoring Infrastructure

Framework overview

child process

slide-21
SLIDE 21

Example:
 
 a simple server serving files from disk

slide-22
SLIDE 22

class FileResponseData(ResponseData): def __init__(self, path): self._size = os.stat(path).st_size self._reader = open(path, 'rb') def read(self, n): return self._reader.read(n) def size(self): return self._size def close(self): self._reader.close()

A file-like class that represents a file served:

BaseServer BaseHandler Client transfer session fork() server callback session callback get_handler() ResponseData get_response_data() RRQ Monitoring Infrastructure child process

slide-23
SLIDE 23

class StaticHandler(BaseHandler): def __init__(self, server_addr, peer, path,

  • ptions, root, stats_callback):

super().__init__(
 server_addr, peer, path,

  • ptions, stats_callback)

self._root = root self._path = path def get_response_data(self): return FileResponseData(


  • s.path.join(self._root, self._path))

A class that deals with a transfer session:

BaseServer BaseHandler Client transfer session fork() server callback session callback get_handler() ResponseData get_response_data() RRQ Monitoring Infrastructure child process

slide-24
SLIDE 24

class StaticServer(BaseServer): def __init__( self, address, port, retries, timeout, root, handler_stats_callback, server_stats_callback ): self._root = root self._handler_stats_callback = \ handler_stats_callback super().__init__( address, port, retries, timeout, server_stats_callback) def get_handler(self, server_addr, peer, path, options): return StaticHandler( server_addr, peer, path, options, self._root, self._handler_stats_callback)

BaseServer class ties everything together:

BaseServer BaseHandler Client transfer session fork() server callback session callback get_handler() ResponseData get_response_data() RRQ Monitoring Infrastructure child process

slide-25
SLIDE 25

def print_session_stats(stats): print(stats) def print_server_stats(stats): counters = stats.get_and_reset_all_counters() print('Server stats - every {} seconds’.format( stats.interval)) print(counters) server = StaticServer( ip='', port='69', retries=3, timeout=5, root='/var/tftproot/', print_session_stats, print_server_stats) try: server.run() except KeyboardInterrupt: server.close()

The “main”

BaseServer BaseHandler Client transfer session fork() server callback session callback get_handler() ResponseData get_response_data() RRQ Monitoring Infrastructure child process

slide-26
SLIDE 26

How do we use it?

tftp Servers

HTTP
 repo

Improvements

  • No more physical LBs
  • No waste of resources
  • Stats!
  • TFTP servers are dynamic
  • Config files (e.g. grub/ipxe

configs) are generated

  • Static files are streamed
  • You can hit any server
  • No need to rsync data
  • Container-friendly

Provisioning
 backends

tftp fbtftp

local disk cache

static files dynamic
 files requests can
 hit any server

slide-27
SLIDE 27

Routing TFTP traffic

NetNorad Latency Maps DHCP

LBs are gone: which TFTP server will serve a given client? NetNorad publishes latency maps periodically, DHCP consumes it. Read about NetNorad on our blog: http://tinyurl.com/hacrw7c

Location of server to provision Closest
 TFTP
 server TFTP Health checks Service
 discovery

slide-28
SLIDE 28

POP1 DC

Fetches static files from closest origin

  • nly for cache misses
  • r if files changed

local
 fbtftp local
 fbtftp

POP2

POPs locations are fictional

slide-29
SLIDE 29

Thanks for listening!

Feel free to email me at pallotron@fb.com

Project home:
 https://github.com/facebook/fbtftp/ Install and play with it: 
 $ pip3 install fbtftp Poster session Tuesday at 14.45:
 Python in Production Engineering