CESSNA: Resilient Edge Computing Yotam Harchol UC Berkeley Joint - - PowerPoint PPT Presentation

cessna resilient edge computing
SMART_READER_LITE
LIVE PREVIEW

CESSNA: Resilient Edge Computing Yotam Harchol UC Berkeley Joint - - PowerPoint PPT Presentation

CESSNA: Resilient Edge Computing Yotam Harchol UC Berkeley Joint work with: Aisha Mushtaq, Murphy McCauley, Aurojit Panda, Scott Shenker SIGCOMM MECOMM Workshop, Budapest, Hungary, August 2018 Client-Server Computing Session Establishment


slide-1
SLIDE 1

CESSNA: Resilient Edge Computing

Yotam Harchol UC Berkeley

Joint work with: Aisha Mushtaq, Murphy McCauley, Aurojit Panda, Scott Shenker

SIGCOMM MECOMM Workshop, Budapest, Hungary, August 2018

slide-2
SLIDE 2

Client-Server Computing

Session Establishment Fate-sharing Server replication

slide-3
SLIDE 3

Client-Edge-Server Computing

` Session goes through the edge Edge application can be stateful State depends on packets from both sides and their interleave ordering Edge may not be reliable

Problem: How to maintain correctness of the state at the edge, under failover / mobility

slide-4
SLIDE 4

Examples for Stateful Edge Applications

Compression at the edge Video conferencing* Online gaming Data aggregation (e.g., for IoT)

* Control channel is stateful, video channel may not be

slide-5
SLIDE 5

Goals

Correct Recovery

  • New edge “sees” the same

sequence of messages

  • Transient “stall”

Survivability

  • Arbitrary # of lost edges
  • Edge failure never kills session

Client Mobility

Recovery may be needed at a remote edge

High Throughput

Edge should provide high throughput

slide-6
SLIDE 6

Strawman Solution #1: Replication

Edge is replicated è Must have multiple hot backups, actively running and consistently updated è Not applicable for client mobility ✓ Correct recovery ✘ Survivability ✘ Client mobility ✘ High throughput

slide-7
SLIDE 7

Strawman Solution #2: Message Replay

Client keeps a log of its outgoing packets Server keeps a log of its outgoing packets Problem 1: Packet logs may become very long è can use periodic snapshots Problem 2: Need to know the replay order between client and server packets è ?? ✘ Correct recovery ✓ Survivability ✓ Client mobility ✓ High throughput

slide-8
SLIDE 8

The Challenge of Interleave Ordering

Edge

1 2 3 4 4 3 2 1

Messages arrive at the edge at two different sockets, simultaneously Multiple possible ordering sequences of messages The edge is a state-machine - Each packet changes the state (state transition)

1 1

Multiple correct states we could be at after receiving more than one message

Edge 1 Edge 2

1 2 3 4 1 2 3 4

Faithful Replay: We want to replay messages in the exact same order Exactly the same state traversal order Exactly the same correct state

slide-9
SLIDE 9

CESSNA –

Client-Edge-Server for Stateful Network Applications

A software framework for running resilient edge applications

Client application

U n m

  • d

i f i e d

Server application

U n m

  • d

i f i e d

Edge application

N E W

Your edge application comes here

Your server application comes here

Your client application comes here

Client agent Edge API Edge Platform Server agent Client Edge Server C E S S N A F r a m e w o r k

Assumptions: 1. Edge application instance per client-server session 2. Deterministic edge application: no real randomness, no multithreading within an instance

slide-10
SLIDE 10

CESSNA

Client keeps a log of its outgoing packets Server keeps a log of its outgoing packets Edge tracks ordering as it handles packets Attaches ordering information to outgoing packets Edge takes periodic snapshots and sends to client, or to another edge à Packet logs and ordering info are safely pruned Recovery algorithm: enables faithful replay

Ordering Ordering

One recovery

  • ption: remote

(cold) recovery

slide-11
SLIDE 11

Local Recovery

Ordering

Local recovery storage Designated alternate edge Two operational modes: Cold standby: Upon failure, instantiate alternate edge Hot standby: Alternate edge always running with latest snapshot

slide-12
SLIDE 12

Recovery Algorithm

Client messages: Server messages:

  • C. ordering:

2 6 5 4 3 6 5 4 3 3 3 4 2 2 1 1 2 1 1

LMBS:

1

LCMBS:

1

(last message before snapshot) (last common message before snapshot)

Input:

  • S. ordering:

3 4 5 3 4 2 2 1 1

LMRC: LMRS:

5 3

(last message received by client) (last message received by server)

Edge App

Server Client

slide-13
SLIDE 13

Local Cache

Netflix instance Netflix instance Netflix instance Netflix instance Cache Netflix server Edge

slide-14
SLIDE 14

Edge Machine

CESSNA Design

Native Application

Socket Interposition Layer

Client Agent TCP Proxy Edge Agent Container Edge Application Edge API

Runtime Engine Daemon

Native Application

TCP Proxy Server Agent

Application Cache

Client Edge Server

Local Recovery Server

Data plane link Control plane link

On connect()

(somewhat different than in the paper)

slide-15
SLIDE 15

Edge App API

Must implement:

  • recv_client_msg(data)
  • recv_server_msg(data)

Optional:

  • init()
  • accept_client_connection()
  • shutdown()

Provided:

  • send_msg_to_client(data)
  • send_msg_to_server(data)
  • cache_read(obj_name)
  • set_timeout(func, time)

Example: Edge Compression Service class CompressionApp(cessna_app.Application): def __init__(self): cessna_app.Application.__init__(self) self.compressor = zlib.compressobj() self.decompressor = zlib.decompressobj() def recv_server_msg(self, data): decomp = self.decompressor.decompress(data) decomp += self.decompressor.flush() self.send_msg_to_client(decomp) def recv_client_msg(self, data): comp = self.compressor.compress(data) comp += self.compressor.flush(zlib.Z_FULL_FLUSH) self.send_msg_to_server(comp)

slide-16
SLIDE 16

Edge Machine

Initial Implementation

Native Application

Socket Interposition Layer

Client Agent TCP Proxy Edge Agent Container Edge Application Edge API

Runtime Engine Daemon

Native Application

TCP Proxy Server Agent

Application Cache

Client Edge Server

Local Recovery Server

Data plane link Control plane link

Blind Forwarder Edge Compression Multiplayer Battleship IoT Aggregation

On connect()

slide-17
SLIDE 17

Initial Evaluation

(Not part of the workshop paper) C,E,S co-located C,E – West US, S - varies Overhead < 600 μs

slide-18
SLIDE 18

Snapshot Latency Overhead

20 40 60 80 100 120 140

Application Memory Usage [MB]

500 1000 1500

Snapshot Overhead [ms]

slide-19
SLIDE 19

Recovery Latency Overhead

For cold recovery: Docker restore: 87% (488 ms) Snapshot loading: 10% (57 ms) Recovery algorithm: 3% (20 ms)

  • N. Virginia
  • N. California

Frankfurt Local Hot Local Cold Remote Remote Remote 200 400 600 800 1000 1200 1400

Latency Overhead [ms]

(Original edge in N. Virginia)

slide-20
SLIDE 20

Future Work

  • Improve snapshot & recovery times
  • Use different edge runtimes
  • Use language-level snapshotting / serialization
  • CESSNA over HTTP – work in progress
  • Multiple clients per session – hard problem!
slide-21
SLIDE 21

Conclusions

  • Consistency of stateful edge applications is challenging
  • State is dependent on two parties
  • Edge platforms are considered less reliable
  • CESSNA provides strong correctness guarantees
  • Also enables client mobility with edge
  • Two recovery modes for efficient recovery
  • Local recovery – hot / cold standby
  • Remote recovery
  • Per packet latency overhead < 700 μs
slide-22
SLIDE 22

Questions?

Thank you