Rollback-Recovery for Middleboxes Justine Sherry , Peter Xiang Gao, - PowerPoint PPT Presentation

Rollback-Recovery for Middleboxes Justine Sherry , Peter Xiang Gao, Soumya Basu, Aurojit Panda, Arvind Krishnamurthy, Christian Maciocco, Maziar Manesh, João Martins, Sylvia Ratnasamy, Luigi Rizzo, Scott Shenker

Middlebox Recovery: fail over to a back-up device after a middlebox goes offline, without interrupting connectivity or causing errors.

Key Challenge: Correctness vs Performance

Systems Today: Correctness xor Performance ❖ Cold restart: ❖ Fast, no overhead ❖ Leads to buggy behaviour for stateful MBs, like missed attack detection ❖ Using snapshot/checkpoint: ❖ Correctness guaranteed, no modification to MB ❖ But adds latencies of 8-50ms; increases page loads by 200ms-1s ❖ Active-active implementation: ❖ Cannot guarantee correctness either because of non-determinism

1980’s FT Research: “Output Commit” Before releasing a packet : has all information Output Commit reflecting that packet been committed to stable triggered storage? frequently. Middleboxes produce Typically implemented Necessary condition output every with check every time for correctness. microsecond; release data is released. operates in parallel.

FTMB: “Fault-Tolerant Middlebox” Correct Recovery and Performance 30us latency overhead Obeys output commit using ordered logging 5 -30% throughput and parallel release. reduction

FTMB implements Rollback Recovery. Backup Three Part Algorithm: Snapshot Log Check Output Logger Input Logger Master Middlebox

Rollback Recovery Backup Three Part Algorithm: Every k milliseconds, Snapshot snapshot complete Log system state. Check Output Logger Input Logger Master Middlebox

Rollback Recovery Backup Three Part Algorithm: Can now restore Snapshot system to stale state at Log recovery time. Check Output Logger Input Logger Master Middlebox

Rollback Recovery Backup Will restore last Three Part Algorithm: 100ms of system state Snapshot using replay, which Log requires logging. Check Output Logger Input Logger Master Middlebox

Rollback Recovery Backup Check to make sure Three Part Algorithm: we have all logged data Snapshot required for replay at Log Output Logger. Check Output Logger Input Logger Master Middlebox

Rollback Recovery Backup On Recovery, restore and replay. Output Logger Input Logger Master Middlebox

Rollback Recovery Three Part Algorithm: Snapshot Log Snapshotting Check algorithms are well- Snap! known. We used VM checkpointing.

Rollback Recovery Three Part Algorithm: Snapshot Open Questions: Log Check (1)What do we need to log for correct replay? - A classically hard problem due to nondeterminism. (2)How do we check that we have everything we need to replay a given packet? - Need to monitor system state that is updated frequently and on multiple cores.

Quick Intro: Middlebox Architecture

Middlebox Architecture Core 1 C C I Core 2 N I N t u t u t p Core 3 p u n O I Core 4

Middlebox Architecture Input NIC “hashes” incoming packets to cores. Core 1 C C I Core 2 N I N t u t u t p Core 3 p u n O I Core 4 All packets from same flow are processed by same core.

Middlebox Architecture: State Local state: only relevant to one connection. Core 1 C C I Core 2 N I N Number of bytes in flow f t u t u t p Z Core 3 p u n O I Core 4 Accessing local state is fast because only one core “owns” the data.

Middlebox Architecture: State 13 Core 1 C C I Core 2 N I N t u t u t p Core 3 p u n O I Core 4 A B List of active connections Total number of HTTP flows in permitted to pass. the last hour

Middlebox Architecture: State 13 Core 1 C C I Core 2 N I N t u t u t p Core 3 p u n O I Core 4 A B Reading shared state is slower. Writing is most expensive because it can cause contention!

Rollback Recovery Three Part Algorithm: Snapshot Open Questions: Log Check (1)What do we need to log for correct replay? - A classically hard problem due to nondeterminism. (2)How do we check that we have everything we need to replay a given packet? - Need to monitor system state that is updated frequently and on multiple cores.

Parallelism + Shared State 4 5 Core 1 C C I Core 2 N I N t u t u t p Core 3 p u n O I Core 4 A B List of active connections Total number of HTTP flows in MB Rule: allow new connections, unless A>=5. permitted to pass. the last hour

Parallelism + Shared State 4 5 Core 1 C C I Core 2 N I N t u t u t p Core 3 p u n O I Core 4 A B MB Rule: allow new connections, unless A>=5.

FTMB logs all accesses to shared state using Packet Access Logs (PAL).

Parallelism + Shared State Packet Access Log Packet Access Log 4 5 RED BLACK Core 1 C C accessed accessed I Core 2 N I N t u t A A u t p Core 3 p u n O I FIRST SECOND Core 4 A B

Rollback Recovery Three Part Algorithm: Snapshot Open Questions: Log Check (1)What do we need to log for correct replay? - Packet Access Logs record accesses to shared state. (2)How do we check that we have everything we need to replay a given packet? - Need to monitor system state that is updated frequently and on multiple cores.

Checking for Safe Release $fe {34} Core 1 C C I Core 2 N I N t u t u t p Core 3 p u n O I Core 4 X Y

Checking for Safe Release $fe {34} Output Logger Core 1 C C I Core 2 N I N t u t u t p Core 3 p u n O I Core 4 X Y

Checking for Safe Release $fe {34} Output Logger Core 1 C C I Core 2 N I N t u t u t p Core 3 p u n O I Core 4 X Y Do I have all PALs so that I can replay the system up to and including this packet?

Checking for Safe Release $fe {34} $3 Output Logger Core 1 C C I Core 2 N I N t u t u t p Core 3 p u n O I Core 4 X Y If black packet were released now, would only need PAL {X, Black, First}

Need to read! Checking for Safe Release 2 1 0 $fe {34} $66 $3 {77} Output Logger Core 1 C C I Core 2 N I N t u t u t p Core 3 p u n O I Core 4 X Y If blue packet were released now, would need its own PALs, and {X, Black, First}

Checking for Safe Release 2 1 0 0 1 2 Accessed Y after X! $fe {34} $66 $66 $3 $3 {77} {77} {4a} {5} Output Logger Core 1 C C I Core 2 N I N t u t u t p Core 3 p u n O I Core 4 X Y Red packet needs its own PAL, and {Blue, Y, First} …and {Blue, X, 2nd} and {Black, X, First}

Checking for Safe Release $fe {34} $66 $3 {77} {4a} {5} Output Logger Core 1 C C I Core 2 N I N t u t u t p Core 3 p u n O I Core 4 X Y Can depend on PALs from different cores, for variables packet never accessed!

Checking for Safe Release $fe {34} Output Logger Core 1 C C C I Core 2 N I I N N t u t t u u t p Core 3 p p u n n O I I Core 4 X Y FTMB is O(#cores) and read-only, making it fast.

Ordered Logging and Parallel Release $fe {34} Output Logger Core 1 C C C I Core 2 N I I N N t u t t u u t p Core 3 p p u n n O I I Core 4 X Y Key Insight: Packet cannot depend on a PAL that does not exist yet.

Ordered Logging and Parallel Release $fe {34} Output Logger Core 1 C C C I Core 2 N I I N N t u t t u u t p Core 3 p p u n n O I I Core 4 X Y Key Insight: Packet cannot depend on a PAL that does not exist yet. PALs are written to output queues immediately when created.

Ordered Logging and Parallel Release $fe {34} Output Logger Core 1 C C C I Core 2 N I I N N t u t t u u t p Core 3 p p u n n O I I Core 4 X Y When packet arrives at output queue, all PALs it depends on are already enqueued; or are already at output logger.

Ordered Logging and Parallel Release $fe {34} Output Logger Core 1 Problem: synchronizing C C C I Core 2 N I I behavior across all N N t u t cores is t u u t p Core 3 p p u expensive! n n O I I Core 4 X Y What we want: “flush” all PALs to Output Logger. Then we’re done!

Ordered Logging and Parallel Release $fe {34} Output Logger Core 1 * C C C I Core 2 N I I N N t u t t u u t p Core 3 p p u n n O I I Core 4 X Y Each core keeps a counter tracking the “youngest” PAL it has created. On release, packet reads counters across all cores. (O(#cores) reads)

Ordered Logging and Parallel Release $fe {34} Output Logger Core 1 * C C C I Core 2 N I I N N * t u t t u u t p Core 3 p p u n n O I I Core 4 X Y Output logger keeps counter representing max PALs received. Receive packet: reads marker to compare against other cores’ counters.

Ordered Logging and Parallel Release $fe {34} Output Logger Core 1 C C C I Core 2 N I I N N t u t t u u t p Core 3 p p u n n O I I Core 4 X Y If marker <= all counters, can release packet!

Rollback-Recovery for Middleboxes Justine Sherry , Peter Xiang Gao, - PowerPoint PPT Presentation

Rollback-Recovery for Middleboxes Justine Sherry , Peter Xiang Gao, Soumya Basu, Aurojit Panda, Arvind Krishnamurthy, Christian Maciocco, Maziar Manesh, Joo Martins, Sylvia Ratnasamy, Luigi Rizzo, Scott Shenker Middlebox Recovery: fail over to

Embark: Securely Outsourcing Middleboxes to the Cloud Chang Lan, Justine Sherry, Raluca Ada

Embark: Securely Outsourcing Middleboxes to the Cloud Chang Lan, Justine Sherry, Raluca Ada

Check Pointing and Rollback Recovery Course: Distributed Computing Faculty: Dr. Rajendra Prasath

Middlebox Technologies with Intel SGX A Literature Survey Shiv Kushwah & Sumukh Shivakumar 1

Exposing and Evading Middlebox Policies DAVID CHOFFNES Middleboxes are pervasive In-network

Extending Hardware Transactional Memory Capacity via Rollback-Only Transactions and Suspend/Resume

Modular rollback through free monads Conor McBride, Olin Shivers, Aaron Turon Tuesday, September

Blocking and Non-blocking Checkpointing and Rollback Recovery for Networks-on-Chip Claudia Rusu 1

Strip Recovery: Strip Recovery: Strip Recovery: Strip Recovery: A 12 A 12- -Step

Motivation Atomicity: Transactions may abort (Rollback). Logging and

Community Recovery Forum Presenter: Cr Mary Brown Overview of Recovery Structure

RECOVERY OPERATIONS Performing recovery and related operations Acronis Training and Certification

Continuity and Recovery Planning Continuity and Recovery Planning Continuity and Recovery

Contents What is Recovery? What is Better Recovery? What is Community

From Recovery Strategy to Recovery Framework Session Outline Why a Recovery Framework 1 2 What

Recovery Lizzie Jacobs GBRT Sport Science Intern 1 What we will cover What is recovery

Audio Programming with Chuck Session 6: Multi-Threading and Concurrency Vitor Guerra Rolla

What is group w are? Soft w are sp e cic al ly designed to supp ort group w

Shreds: Fine-grained Execution Units with S R D H E S Private Memory Yaohui Chen,

DUNE ND Hall Study Mike Wilking @ Stony Brook Luke Pickering and Dan Douglas @ Michigan State

Protecting Personally Identifiable Information Audio is available only by conference call. Please

RIT Senior Design Project MSD Team P17701 Goodwill/ABVI Cotton Upcycling Project Goals Overall

THE REACTOR THE REACTOR OVERSIGHT PROGRAM OVERSIGHT PROGRAM David Lochbaum David Lochbaum

Protecting Yourself Against Scams and Data Breaches During Coronavirus Crisis ... and Beyond

Rollback-Recovery for Middleboxes Justine Sherry , Peter Xiang Gao, - PowerPoint PPT Presentation

Rollback-Recovery for Middleboxes Justine Sherry , Peter Xiang Gao, Soumya Basu, Aurojit Panda, Arvind Krishnamurthy, Christian Maciocco, Maziar Manesh, Joo Martins, Sylvia Ratnasamy, Luigi Rizzo, Scott Shenker Middlebox Recovery: fail over to

Embark: Securely Outsourcing Middleboxes to the Cloud Chang Lan, Justine Sherry, Raluca Ada

Embark: Securely Outsourcing Middleboxes to the Cloud Chang Lan, Justine Sherry, Raluca Ada

Check Pointing and Rollback Recovery Course: Distributed Computing Faculty: Dr. Rajendra Prasath

Middlebox Technologies with Intel SGX A Literature Survey Shiv Kushwah &amp; Sumukh Shivakumar 1

Exposing and Evading Middlebox Policies DAVID CHOFFNES Middleboxes are pervasive In-network

Extending Hardware Transactional Memory Capacity via Rollback-Only Transactions and Suspend/Resume

Modular rollback through free monads Conor McBride, Olin Shivers, Aaron Turon Tuesday, September

Blocking and Non-blocking Checkpointing and Rollback Recovery for Networks-on-Chip Claudia Rusu 1

Strip Recovery: Strip Recovery: Strip Recovery: Strip Recovery: A 12 A 12- -Step

Motivation Atomicity: Transactions may abort (Rollback). Logging and

Community Recovery Forum Presenter: Cr Mary Brown Overview of Recovery Structure

RECOVERY OPERATIONS Performing recovery and related operations Acronis Training and Certification

Continuity and Recovery Planning Continuity and Recovery Planning Continuity and Recovery

Contents What is Recovery? What is Better Recovery? What is Community

From Recovery Strategy to Recovery Framework Session Outline Why a Recovery Framework 1 2 What

Recovery Lizzie Jacobs GBRT Sport Science Intern 1 What we will cover What is recovery

Audio Programming with Chuck Session 6: Multi-Threading and Concurrency Vitor Guerra Rolla

What is group w are? Soft w are sp e cic al ly designed to supp ort group w

Shreds: Fine-grained Execution Units with S R D H E S Private Memory Yaohui Chen,

DUNE ND Hall Study Mike Wilking @ Stony Brook Luke Pickering and Dan Douglas @ Michigan State

Protecting Personally Identifiable Information Audio is available only by conference call. Please

RIT Senior Design Project MSD Team P17701 Goodwill/ABVI Cotton Upcycling Project Goals Overall

THE REACTOR THE REACTOR OVERSIGHT PROGRAM OVERSIGHT PROGRAM David Lochbaum David Lochbaum

Protecting Yourself Against Scams and Data Breaches During Coronavirus Crisis ... and Beyond

Middlebox Technologies with Intel SGX A Literature Survey Shiv Kushwah & Sumukh Shivakumar 1