tolerating application failures with legosdn
play

Tolerating Application Failures with LegoSDN Balakrishnan - PowerPoint PPT Presentation

Tolerating Application Failures with LegoSDN Balakrishnan Chandrasekaran Theophilus Benson Duke University Quality of Code In C, I never learned to use the debugger, so I used to never make mistakes I went millions and millions


  1. Tolerating Application Failures with LegoSDN Balakrishnan Chandrasekaran Theophilus Benson Duke University

  2. Quality of Code “In C, I never learned to use the debugger, so I used to never make mistakes …” “I went millions and millions of hours with no problems—probably tens of millions of hours with no problems.” — Arthur Whitney, creator of A , K and Q . ACM Queue, Feb 2009. October 28, 2014 HotNets 2014 | LegoSDN 2

  3. Bugs are endemic in software! § Bugs can be deterministic or non- deterministic § [STS] Pox Premature PacketIn – l2_multi routing module failed unexpectedly with a KeyError. October 28, 2014 HotNets 2014 | LegoSDN 3

  4. Cascading Crashes App1 App2 … A A A Controller in out October 28, 2014 HotNets 2014 | LegoSDN 4

  5. Cascading Crashes App1 App2 … A A A Controller in out October 28, 2014 HotNets 2014 | LegoSDN 5

  6. Cascading Crashes App1 App2 … A A A Controller in out October 28, 2014 HotNets 2014 | LegoSDN 6

  7. LegoSDN § Availability is of utmost importance – Second only to security October 28, 2014 HotNets 2014 | LegoSDN 7

  8. Fate-sharing § Fate-sharing relationships between – the SDN controller and the SDN application(s) (also between SDN applications) – the SDN application and the network § Failure in any one SDN application brings down the other applications, and the SDN controller. October 28, 2014 HotNets 2014 | LegoSDN 8

  9. Three-pronged approach 1 App1 App2 … A A A Controller Contain c crash in out October 28, 2014 HotNets 2014 | LegoSDN 9

  10. Three-pronged approach App1 App2 … A A A Controller Undo c changes es in out 2 October 28, 2014 HotNets 2014 | LegoSDN 10

  11. Three-pronged approach App1 App2 … A A A Controller Handle m e mes essage in 3 out October 28, 2014 HotNets 2014 | LegoSDN 11

  12. Controller architecture must support two new abstractions October 28, 2014 HotNets 2014 | LegoSDN 12

  13. Current architecture App1 App2 A A Controller October 28, 2014 HotNets 2014 | LegoSDN 13

  14. Isolate SDN-Apps from the controller Sandbox Sandbox App1 App2 A A Controller October 28, 2014 HotNets 2014 | LegoSDN 14

  15. Isolate SDN-Apps from the controller Sandbox Sandbox App1 App2 A A Controller October 28, 2014 HotNets 2014 | LegoSDN 15

  16. Isolate SDN-Apps from the controller Sandbox Sandbox App1 App2 A A Controller October 28, 2014 HotNets 2014 | LegoSDN 16

  17. Isolate SDN-Apps from the network Sandbox App1 A Controller a October 28, 2014 HotNets 2014 | LegoSDN 17

  18. Isolate SDN-Apps from the network Sandbox App1 A Controller a October 28, 2014 HotNets 2014 | LegoSDN 18

  19. LegoSDN Sandbox Ap AppVisor S Stub App1 Lightweight wrapper A AppVisor Stub Ap AppVisor P Proxy xy AppVisor Proxy Message dispatcher Controller SDN-App is treated as a black-box. a NetLog Stub and proxy allow SDN-Apps to talk to controller. NetLog Ne tLog Transactional support October 28, 2014 HotNets 2014 | LegoSDN 19

  20. LegoSDN Sandbox Built o on t top o of F FloodLight App1 A Ported three applications bundled with AppVisor Stub FloodLight to LegoSDN AppVisor Proxy Controller a NetLog October 28, 2014 HotNets 2014 | LegoSDN 20

  21. Three-pronged approach App1 App2 … A A A Controller Handle m e mes essage in 3 out October 28, 2014 HotNets 2014 | LegoSDN 21

  22. How do you handle the crash inducing message? October 28, 2014 HotNets 2014 | LegoSDN 22

  23. 1. Crash and burn § Halt the application – SDN-App cannot continue processing – Other SDN-Apps can continue unaffected § No Compromise – Think of security related SDN-Apps Correctness : SDN-App’s ability to implement its functionality without change, according to the specification. October 28, 2014 HotNets 2014 | LegoSDN 23

  24. 2. Induce amnesia § Ignore or drop the crash inducing message – SDN-App will not see the message again § Complete Compromise October 28, 2014 HotNets 2014 | LegoSDN 24

  25. 3. Apply transformations § Transform the offending message into another one that the application can handle – application will continue with a modified input § Equivalence Compromise October 28, 2014 HotNets 2014 | LegoSDN 25

  26. Course of action? No Compromise Apply T ransformation(s) Complete Compromise Operator October 28, 2014 HotNets 2014 | LegoSDN 26

  27. Related work § Fault tolerance – via reboots – applying Paxos for leader selection § Debugging SDN-Apps or the controller October 28, 2014 HotNets 2014 | LegoSDN 27

  28. Message equivalence § How do you determine two messages are equivalent? October 28, 2014 HotNets 2014 | LegoSDN 28

  29. Rollbacks are non-trivial § Rollback of one or more rules installed changes controller’s view of the state of network – Might induce crashes of other SDN applications that rely on a consistent view of network state October 28, 2014 HotNets 2014 | LegoSDN 29

  30. Error propagation § Last message received by the SDN-App prior to the crash need not be the culprit! – How far along should we go back in history to find the root cause of the crash? – Recovery from an earlier checkpoint; How many checkpoints should we maintain? October 28, 2014 HotNets 2014 | LegoSDN 30

  31. Road ahead § Rethink controller architecture – LegoSDN is only the tip of the iceberg. § Resilient controllers can catalyze adoption § Failures need to be a first-class citizen October 28, 2014 HotNets 2014 | LegoSDN 31

  32. October 28, 2014 HotNets 2014 | LegoSDN 32

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend