Tolerating Application Failures with LegoSDN Balakrishnan - - PowerPoint PPT Presentation

tolerating application failures with legosdn
SMART_READER_LITE
LIVE PREVIEW

Tolerating Application Failures with LegoSDN Balakrishnan - - PowerPoint PPT Presentation

Tolerating Application Failures with LegoSDN Balakrishnan Chandrasekaran Theophilus Benson Duke University Quality of Code In C, I never learned to use the debugger, so I used to never make mistakes I went millions and millions


slide-1
SLIDE 1

Tolerating Application Failures with LegoSDN

Balakrishnan Chandrasekaran Theophilus Benson

Duke University

slide-2
SLIDE 2

Quality of Code

“In C, I never learned to use the debugger, so I used to never make mistakes …” “I went millions and millions of hours with no problems—probably tens of millions of hours with no problems.” — Arthur Whitney, creator of A, K and Q.

ACM Queue, Feb 2009.

October 28, 2014 HotNets 2014 | LegoSDN 2

slide-3
SLIDE 3

Bugs are endemic in software!

§ Bugs can be deterministic or non- deterministic § [STS] Pox Premature PacketIn

– l2_multi routing module failed unexpectedly with a KeyError.

October 28, 2014 HotNets 2014 | LegoSDN 3

slide-4
SLIDE 4

Cascading Crashes

October 28, 2014 HotNets 2014 | LegoSDN 4

Controller

A

App1

A

App2

A

in

  • ut
slide-5
SLIDE 5

Cascading Crashes

October 28, 2014 HotNets 2014 | LegoSDN 5

Controller

A

App1

A

App2

A

in

  • ut
slide-6
SLIDE 6

Cascading Crashes

October 28, 2014 HotNets 2014 | LegoSDN 6

Controller

A

App1

A

App2

A

in

  • ut
slide-7
SLIDE 7

LegoSDN

§ Availability is of utmost importance

– Second only to security

October 28, 2014 7 HotNets 2014 | LegoSDN

slide-8
SLIDE 8

Fate-sharing

§ Fate-sharing relationships between

– the SDN controller and the SDN application(s) (also between SDN applications) – the SDN application and the network

§ Failure in any one SDN application brings down the other applications, and the SDN controller.

October 28, 2014 8 HotNets 2014 | LegoSDN

slide-9
SLIDE 9

Three-pronged approach

Controller

A

App1

A

App2

A

in

  • ut

1

October 28, 2014 HotNets 2014 | LegoSDN 9

Contain c crash

slide-10
SLIDE 10

Three-pronged approach

Controller

A

App1

A

App2

A

in

  • ut

2

October 28, 2014 HotNets 2014 | LegoSDN 10

Undo c changes es

slide-11
SLIDE 11

Three-pronged approach

Controller

A

App1

A

App2

A

in

  • ut

3

October 28, 2014 HotNets 2014 | LegoSDN 11

Handle m e mes essage

slide-12
SLIDE 12

Controller architecture must support two new abstractions

October 28, 2014 HotNets 2014 | LegoSDN 12

slide-13
SLIDE 13

Current architecture

Controller

A

App1

A

App2

October 28, 2014 HotNets 2014 | LegoSDN 13

slide-14
SLIDE 14

Isolate SDN-Apps from the controller

Sandbox

A

App1 Sandbox

A

App2 Controller

October 28, 2014 HotNets 2014 | LegoSDN 14

slide-15
SLIDE 15

Isolate SDN-Apps from the controller

Sandbox

A

App1 Sandbox

A

App2 Controller

October 28, 2014 HotNets 2014 | LegoSDN 15

slide-16
SLIDE 16

Isolate SDN-Apps from the controller

Sandbox

A

App1 Sandbox

A

App2 Controller

October 28, 2014 HotNets 2014 | LegoSDN 16

slide-17
SLIDE 17

Isolate SDN-Apps from the network

Sandbox

A

App1 Controller

a

October 28, 2014 HotNets 2014 | LegoSDN 17

slide-18
SLIDE 18

Isolate SDN-Apps from the network

Sandbox

A

App1 Controller

a

October 28, 2014 HotNets 2014 | LegoSDN 18

slide-19
SLIDE 19

LegoSDN

Ap AppVisor S Stub Lightweight wrapper Ap AppVisor P Proxy xy Message dispatcher SDN-App is treated as a black-box.

Stub and proxy allow SDN-Apps to talk to controller.

Ne NetLog tLog Transactional support

Sandbox

A

App1 Controller

a

AppVisor Stub AppVisor Proxy NetLog

October 28, 2014 HotNets 2014 | LegoSDN 19

slide-20
SLIDE 20

LegoSDN

Built o

  • n t

top o

  • f F

FloodLight Ported three applications bundled with FloodLight to LegoSDN

Sandbox

A

App1 Controller

a

AppVisor Stub AppVisor Proxy NetLog

October 28, 2014 HotNets 2014 | LegoSDN 20

slide-21
SLIDE 21

Three-pronged approach

Controller

A

App1

A

App2

A

in

  • ut

3

October 28, 2014 HotNets 2014 | LegoSDN 21

Handle m e mes essage

slide-22
SLIDE 22

How do you handle the crash inducing message?

October 28, 2014 HotNets 2014 | LegoSDN 22

slide-23
SLIDE 23
  • 1. Crash and burn

§ Halt the application

– SDN-App cannot continue processing – Other SDN-Apps can continue unaffected

§ No Compromise

– Think of security related SDN-Apps

Correctness: SDN-App’s ability to implement its functionality without change, according to the specification.

October 28, 2014 HotNets 2014 | LegoSDN 23

slide-24
SLIDE 24
  • 2. Induce amnesia

§ Ignore or drop the crash inducing message

– SDN-App will not see the message again

§ Complete Compromise

October 28, 2014 HotNets 2014 | LegoSDN 24

slide-25
SLIDE 25
  • 3. Apply transformations

§ Transform the offending message into another one that the application can handle

– application will continue with a modified input

§ Equivalence Compromise

October 28, 2014 HotNets 2014 | LegoSDN 25

slide-26
SLIDE 26

Course of action?

No Compromise Apply T ransformation(s) Complete Compromise

Operator

October 28, 2014 HotNets 2014 | LegoSDN 26

slide-27
SLIDE 27

Related work

§ Fault tolerance

– via reboots – applying Paxos for leader selection

§ Debugging SDN-Apps or the controller

October 28, 2014 HotNets 2014 | LegoSDN 27

slide-28
SLIDE 28

Message equivalence

§ How do you determine two messages are equivalent?

October 28, 2014 HotNets 2014 | LegoSDN 28

slide-29
SLIDE 29

Rollbacks are non-trivial

§ Rollback of one or more rules installed changes controller’s view of the state of network

– Might induce crashes of other SDN applications that rely on a consistent view of network state

October 28, 2014 HotNets 2014 | LegoSDN 29

slide-30
SLIDE 30

Error propagation

§ Last message received by the SDN-App prior to the crash need not be the culprit!

– How far along should we go back in history to find the root cause of the crash? – Recovery from an earlier checkpoint; How many checkpoints should we maintain?

October 28, 2014 HotNets 2014 | LegoSDN 30

slide-31
SLIDE 31

Road ahead

§ Rethink controller architecture

– LegoSDN is only the tip of the iceberg.

§ Resilient controllers can catalyze adoption § Failures need to be a first-class citizen

October 28, 2014 HotNets 2014 | LegoSDN 31

slide-32
SLIDE 32

October 28, 2014 HotNets 2014 | LegoSDN 32