Automatic Test Packet Generation James Hongyi Zeng with Peyman - PowerPoint PPT Presentation

Automatic Test Packet Generation James Hongyi Zeng with Peyman Kazemian, George Varghese, Nick McKeown Stanford University, UCSD, Microsoft Research http://eastzone.github.com/atpg/ CoNEXT 2012, Nice, France

CS@Stanford Network Outage Tue, Oct 2, 2012 at 7:54 PM: “Between 18:20 -19:00 tonight we experienced a complete network outage in the building when a loop was accidentally created by CSD-CF staff. We're investigating the exact circumstances to understand why this caused a problem, since automatic protections are supposed to be in place to prevent loops from disabling the network.” 2

Outages in the Wild On April 26, 2010, NetSuite suffered a service outage that rendered its cloud-based applications inaccessible to customers worldwide for 30 minutes … NetSuite blamed a network issue for the downtime. The Planet was rocked by a pair of Hosting.com's New Jersey data network outages that knocked it off center was taken down on June 1, line for about 90 minutes on May 2, 2010, igniting a cloud outage and 2010. The outages caused disruptions connectivity loss for nearly two for another 90 minutes the following hours … Hosting.com said the morning.... Investigation found that connectivity loss was due to a the outage was caused by a fault in a software bug in a Cisco switch that router in one of the company's data caused the switch to fail. centers. 3

Network troubleshooting a problem? • Survey of NANOG mailing list (June 2012) – Data set: 61 responders: 23 medium size networks (<10K hosts), 12 large networks (< 100K hosts) – Frequency: 35% generate >100 tickets per month – Downtime: 25% take over an hour to resolve. (estimated $60K-110K/hour [1]) – Current tools: Ping, Traceroute, SNMP – 70% asked for better tools, automatic tests [1] http://www.evolven.com/blog/downtime-outages-and-failures-understanding-their-true-costs.html 4

The Battle Hardware Software Buffers, fiber cuts, broken interfaces, firmware bugs, crashed module mis-labeled cables, flaky links vs + ping, traceroute, wisdom and intuition SNMP, tcpdump 5

Automatic Test Packet Generation Goal: automatically generate test packets to test the network state, and pinpoint faults before being noticed by application. Augment human wisdom and intuition. Reduce the downtime. Save money. Non-Goal: ATPG cannot explain why forwarding state is in error. 6

ATPG Workflow FIBs, ACLs Test Packets Topology ATPG Network Test Results 7

Systematic Testing • Comparison: chip design – Testing is a billion dollar market – ATPG = Automatic Test Pattern Generation 8

Roadmap • Reachability Analysis • Test packet generation and selection • Fault localization • Implementation and Evaluation 9

Reachability Analysis • Header Space Analysis (NSDI 2012) <Port X, Port Y> FIBs, config files All Forwarding Equivalent topology Classes (FECs) flowing X->Y Header Space Analysis • All-pairs reachability: Compute all classes of packets that can flow between every pair of ports. 10

Example Box A r A1 , r A2 , r A3 P A P B r B1 , r B2 , r B3 , r B4 Box C Box B r C1 , r C2 P C 11

All-pairs reachability Box A P A P B Box B Box C P C 12

New Viewpoint: Testing and coverage • HSA represents networks as chips/programs • Standard testing finds inputs that cover every gate/flipflop (HW) or branch/function (SW) Testbench Cover HSA Network Model: Chip model: Test Patterns Test Packets Boolean Algebra Reachability Results Network Under Test Device Under Test 13

New Viewpoint: Testing and coverage • In networks, packets are inputs, different covers – Links: packets that traverse every link – Queues: packets that traverse every queue – Rules: packets that test each router rule • Mission impossible? – testing all rules 10 times per second needs < 1% of link overhead (Stanford/Internet2) 14

Roadmap • Reachability Analysis • Test packet generation and selection • Fault localization • Implementation and Evaluation 15

All-pairs reachability and covers Box A P A P B Box B Box C P C 16

Test Packet Selection • Packets in all-pairs reachability table are more than necessary • Goal: select a minimum subset of packets whose histories cover the whole rule set A Min-Set-Cover problem 17

Min-Set-Cover R1 R2 R3 R4 R5 R6 A B C Packets D E F G R1 R2 R3 R4 R5 R6 B Packets C G 18

Test Packets Selection • Min-Set-Cover – Optimization is NP-Hard – Polynomial approximation (O(N^2)) Test Packets Regular Packets Min-Set-Cover Reserved Packets - Exercise all rules - “Redundant” - Sent out periodically - Will be used in fault localization 19

Roadmap • Reachability analysis • Test packet generation and selection • Fault localization • Evaluation: offline (Stanford/Internet2), emulated network, experimental deployment 20

Fault Localization 21

Fault Localization • Network Tomography? → Minimum Hitting Set • In ATPG: we can choose packets! • Step 1: Use results from regular test packets – F (potentially broken rules) = Union from all failing packets – P (known good rules) = Union from all passing packets – Suspect Set = F – P Suspects F P 22

Fault Localization • Step 2: Use reserved test packets – Pick packets that test only one rule in the suspect set, and send them out for testing – Passed: eliminate – Failed: label it as “broken” • Step 3: (Brute force…) Continue with test packets that test two or more rules in the suspect set, until the set is small enough 23

Roadmap • Reachability analysis • Test packet generation and selection • Fault localization • Implementation and Evaluation 24

Putting them all together All-pairs Reachability Table Header In Port Out Port Rules 10xx… 1 2 R 1 ,R 5 ,R 20 (3) Test Packet Generator … … … … (sampling + Min-Set-Cover) Fault Localization (2) Header Space Analysis All-pairs (4) Reachability Transfer Function (5) Parser (1) Topology, FIBs, ACLs, etc Test Terminal 25

Implementation • Cisco/Juniper Parsers – Translate router configuration files and forwarding tables (FIB) into Header space representation • Test Packet Generation/Selection – Hassel: A python header space library – Min-Set-Cover – Python’s multiprocess module to parallelize • SDN can simplify the design 26

Datasets • Stanford and Internet2 – Public datasets • Stanford University backbone – ~10,000 HW forwarding entries (compressed from 757,000 FIB rules), 1,500 ACLs – 16 Cisco routers • Internet2 – 100,000 IPv4 forwarding entries – 9 Juniper routers 27

Test Packet Generation Stanford Internet2 Computation Time ~1hour ~40min Regular Packets 3,871 35,462 Packets/Port (Avg) 12.99 102.8 Min-Set-Cover Reduction 160x 85x Ruleset structure <1% Link Utilization when testing 10 times per second! 28

Using ATPG for Performance Testing • Beyond functional problems, ATPG can also be used for detecting and localizing performance problems • Intuition: generalize results of a test from success/failure to performance (e.g. latency) • To evaluate used emulated Stanford Network in Mininet-HiFi – Open vSwitch as routers – Same topology, translated into OpenFlow rules • Users can inject performance errors 29

bbra s5 s3 s4 s1 s2 goza coza boza yoza poza pozb roza 30

Does it work? • Production Deployment – 3 buildings on Stanford campus – 30+ Ethernet switches • Link cover only (instead of rule cover) – 51 test terminals 31

CS@Stanford Network Outage Tue, Oct 2, 2012 at 7:54 PM: “Between 18:20 -19:00 tonight we experienced a complete network outage in the building when a loop was accidentally created by CSD-CF staff. We're investigating the exact circumstances to understand why this caused a problem, since automatic protections are supposed to be in place to prevent loops from disabling the network.” 32

The problem in the email Unreported problem 33

ATPG Limitations • Dynamic/Non-deterministic boxes – e.g. NAT • “Invisible” rules – e.g. backup rules • Transient network states • Ambiguous states (work in progress) – e.g. ECMP 34

Related work Policy “Group X can talk to Group Y ” NICE, Anteater HSA, VeriFlow Control Plane Forwarding Topology Rules ATPG Forwarding State Forwarding Rule != Forwarding State Topology on File != Actual Topology 35

Takeaways • ATPG tests the forwarding state by generating minimal link, queue, rule covers automatically • Brings lens of testing and coverage to networks • For Stanford/Internet2, testing 10 times per second needs <1% of link overhead • Works in real networks. 36

Merci! http://eastzone.github.com/atpg/ 37

Automatic Test Packet Generation James Hongyi Zeng with Peyman - PowerPoint PPT Presentation

Automatic Test Packet Generation James Hongyi Zeng with Peyman Kazemian, George Varghese, Nick McKeown Stanford University, UCSD, Microsoft Research http://eastzone.github.com/atpg/ CoNEXT 2012, Nice, France CS@Stanford Network Outage Tue,

Worm Detection ICMP Packet Analysis Ankur Agiwal 1 2 Packet Content Matching Packet

Introduction to Packet Tracer What is Packet Tracer? Packet Tracer is a protocol simulator

Chapter 7 Packet-Switching Networks Routing in Packet Networks Shortest Path Routing Chapter 7

Model-Based Testing (ISTQB Chapter 4) Arie van Deursen 1 4.1 ISTQB Test Design Test Scripts

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Packet Radio Lee Maddox, N4HOK What is Packet Radio? Packet radio is the connection of a computer

Lab 1: Packet Sniffing and Wireshark Fengwei Zhang SUSTech CS 315 Computer Security 1 Packet

Test Instance Generation Test Instance Generation for MAX 2SAT for MAX 2SAT Mitsuo Motoki

Digital Testing Digital Testing Lecture 9 : Combinational Automatic Test Pattern Automatic

TESTING EQUIPMENTS FOR SAFETY TEST LIST OF TEST EQUIPMENT TEST SETUP FOR AIR CONDITIONER 1.

A Framework for Automatic Generation A Framework for Automatic Generation of Configuration Files

200511316 200511316 Test plan Test design specification g p

FLSA DUTIES TEST Exemption/Duties Test Types of Duties/Exemption Test Executive Exemption

Engineering Best Practices Test, test, test, and test some more; test as you go Start from a

Test automation Building automatically repeatable test suites Test automation n Test automation

Nehemiah Prays Nehemiah 1-2 Here is some test text Here is some test text Here is some test

Towards Improved Cloud Function Scheduling in Function-as-a-Service Platforms Student: Edwin F.

Point Estimation Linear Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie

M A N A G E M E N T F U N D A M E N TA L S C H A N G E G R A D U AT E D I P L O M A I N M A N

Economic Impact of Trade Agreements Implemented Under Trade Authorities Procedures, 2016 Report

Predicting the Costs of Serverless Workflows Simon Eismann Johannes Grohmann Erwin van Eyk

The Modern Design Organization Leah Buley, UX London May 2016 Projected 10-year growth rate of

Quantum-mechanical backflow and scattering Gandalf Lechner joint work with Henning Bostelmann and

Bilevel Optimization, Pricing Problems and Stackelberg Games Martine Labb Computer Science