Validating Pre-commit Network Configuration Changes at Scale with - - PowerPoint PPT Presentation

validating pre commit network configuration changes at
SMART_READER_LITE
LIVE PREVIEW

Validating Pre-commit Network Configuration Changes at Scale with - - PowerPoint PPT Presentation

Validating Pre-commit Network Configuration Changes at Scale with Batfish and Ansible Samir Parikh Andrius Benokraitis Ratul Mahajan Head of Product Principal Product Manager CEO, Co-Founder Intentionet Ansible Network Automation


slide-1
SLIDE 1

Andrius Benokraitis Principal Product Manager Ansible Network Automation andriusb@redhat.com

Validating Pre-commit Network Configuration Changes at Scale with Batfish and Ansible

Ratul Mahajan CEO, Co-Founder Intentionet ratul@intentionet.com Samir Parikh Head of Product Intentionet samir@intentionet.com

slide-2
SLIDE 2

For more information or to register visit: ansible.com/automates


 Tampa, FL, USA November 14, 2018 
 Moscow, Russia November 14, 2018 
 Oslo, Norway November 13, 2018 
 Johannesburg, S. Africa November 27, 2018 
 Antwerp, Belgium December 4, 2018 
 Stockholm, Sweden November 15, 2018

slide-3
SLIDE 3

For more information or to register visit: ansible.com/workshops

Portland, OR November 6, 2018 
 Houston, TX November 7, 2018 
 Rochester, NY November 7, 2018 Seattle, WA November 7, 2018 
 Sunnyvale, CA November 15, 2018 
 Charleston, SC November 27, 2018

slide-4
SLIDE 4

WHAT WE’RE TALKING ABOUT TODAY

Part I Network validation today Part II Comprehensive pre-commit validation with Batfish
 Part III Demo of Ansible + Batfish Q/A

slide-5
SLIDE 5

Intentionet: Who are we?

Founded: 2015 Headquarters: Seattle, WA Funding: NSF, True Ventures

5

Enable organizations to build networks with security and reliability guarantees

Mission Company

slide-6
SLIDE 6

What are we building? 


Comprehensive network validation solution

  • Open source under Apache 2.0 License
  • Growing user community
  • Multiple Fortune 500 companies
  • Growing developer community
  • Intentionet, Princeton, BBN, Microsoft, and others

6

Batfish

slide-7
SLIDE 7

Why are we building it? 


Automation without validation is risky

  • Automation enables scale, consistency and speed

  • But not correctness: a single typo can bring down the entire network


 “To err is human; to propagate errors massively at scale requires automation”


  • Effective change validation is crucial in automated workflows

7

slide-8
SLIDE 8

What makes change validation effective?

8

Production scale & covers ALL possible flows, failures and routes Performed pre-deployment & automated

slide-9
SLIDE 9

Validation methods in use today

9

Text Analysis

‘Presence’ check for configuration attributes

  • NTP server
  • DNS server
  • AAA setting

Pre-Deployment? Comprehensive?

  • Cannot validate network

behavior

  • Brittle and vendor specific
slide-10
SLIDE 10

Validation methods in use today

10

Text Analysis Emulation

‘Presence’ check for configuration attributes

  • NTP server
  • DNS server
  • AAA setting

Check specific network behaviors

  • Are all BGP sessions up?
  • Can test client reach DNS server?
  • What happens if link X fails?

Pre-Deployment? Comprehensive?

  • Cannot validate network

behavior

  • Brittle and vendor specific
  • Not production scale
  • Cannot test ALL possible

flows, failures, route updates

slide-11
SLIDE 11

Validation methods in use today

11

Text Analysis Emulation Operational State Analysis

‘Presence’ check for configuration attributes

  • NTP server
  • DNS server
  • AAA setting

Check specific network behaviors

  • Are all BGP sessions up?
  • Can test client reach DNS server?
  • What happens if link X fails?

Check operational state

  • Are all BGP sessions up?
  • Can client reach DNS server?
  • Does traceroute from X to Y succeed?

Pre-Deployment? Comprehensive?

  • Cannot validate network

behavior

  • Brittle and vendor specific
  • Not production scale
  • Cannot test ALL possible

flows, failures, route updates

  • Cannot test ANY possible

failures, routes updates

  • Cannot test ALL possible

flows

slide-12
SLIDE 12

Validation methods in use today

12

Text Analysis Emulation Operational State Analysis

‘Presence’ check for configuration attributes

  • NTP server
  • DNS server
  • AAA setting

Check specific network behaviors

  • Are all BGP sessions up?
  • Can test client reach DNS server?
  • What happens if link X fails?

Check operational state

  • Are all BGP sessions up?
  • Can client reach DNS server?
  • Does traceroute from X to Y succeed?

Pre-Deployment? Comprehensive?

  • Cannot validate network

behavior

  • Brittle and vendor specific
  • Not production scale
  • Cannot test ALL possible

flows, failures, route updates

  • Cannot test ANY possible

failures, routes updates

  • Cannot test ALL possible

flows

  • No method can provide comprehensive, pre-deployment validation
  • A new approach is needed
slide-13
SLIDE 13

Introducing Model-Based validation

13

Model-Based Analysis

Check ALL possible network behaviors

  • Can ANY flow go from Subnet A to B?
  • Can ALL clients reach DNS server?
  • Will ANY link failure disrupt service X?

Pre-Deployment? Comprehensive?

slide-14
SLIDE 14

Introducing Model-Based validation

14

slide-15
SLIDE 15

How Batfish works?

Network configs Dynamic state (physical / cloud)

15

slide-16
SLIDE 16

How Batfish works?

Network configs Dynamic state (physical / cloud)

16

And many more…

slide-17
SLIDE 17

How Batfish works?

Network configs Dynamic state (physical / cloud)

Analysis engine Network models

Interfaces: Ethernet0/0: InterfaceCost: 1, importPolicy: peer_in ………. 192.0.0.0 ≤ out.prefix

  • ut.prefix ≤ 192.1.0.0

best.valid ⇒ out.lp = 120 best.valid ⇒ out.ad = 20 ………

17

Certifications:

  • All devices are password protected

Violations:

  • Subnets of Leaf-1 and Leaf-3 cannot

communicate

  • rtr-y failure reduces availability

Network Policy Vendor Neutral Configuration Model Routing Model Mathematical Model of 
 Network Behavior

slide-18
SLIDE 18

Batfish Network Policies

  • Policies represent specific network behaviors you want to

ensure hold true

  • Typical categories of policies would be:
  • Security
  • Reliability
  • Compliance

18

slide-19
SLIDE 19

Batfish Network Policies

  • No traffic must pass between

Subnets A & B

  • All traffic between branch
  • ffices must be encrypted
  • No route announcement can

disrupt internal traffic

19

  • No single link failure will

cause an outage

  • DC Fabric must always have

full Leaf to Leaf reachability

  • DNS servers must always be

globally accessible

  • Device access restricted to

secure communication methods only

  • All device settings must

comply with site standards

  • No undefined references are

allowed on any device

Security Reliability Compliance

Policy evaluation provides correctness guarantees for 
 ALL possible packets, link failures, and route announcements

slide-20
SLIDE 20

How can you use Batfish?

  • Build a CI/CD pipeline
  • Proactive / pre-deployment validation
  • Continuous / post-deployment validation
  • Test specific network scenarios
  • Test DR (Disaster Recovery) plan
  • Test network maintenance MOP

20

slide-21
SLIDE 21

Pre-deployment change validation with Ansible and Batfish

21

Production

Ansible PASS Deploy Github Generate Configs Initiate test Configs Ansible Author Change FAIL Ansible Raise Error

slide-22
SLIDE 22

DEMO

22

slide-23
SLIDE 23

Demo

  • Scenarios:
  • 1. Expand DC fabric by adding a

new leaf

  • 2. Enable new service by updating

whitelist on firewalls

23

border-01 fw-01 spine-01 leaf-01

10.1.1.0/24 10.1.2.0/24 10.1.3.0/24 10.1.4.0/24

LHR DC

border-02 fw-02 spine-02 leaf-02

slide-24
SLIDE 24

Scenario 1: Expand DC fabric

24

border-01 fw-01 spine-01 leaf-01

10.1.1.0/24 10.1.2.0/24 10.1.3.0/24 10.1.4.0/24

LHR DC

  • Add new leaf, lhr-leaf-03, to

fabric in POD 1

  • Host subnet 10.1.5.0/24

border-02 fw-02 spine-02 leaf-02

10.1.5.0/24

leaf-03

slide-25
SLIDE 25

Scenario 1 pipeline

1. User input:

  • Leaf Name
  • POD ID
  • BGP ASN

2. Generate configuration using Jinja2 templates 3. Commit changes to git branch 4. Initiate change validation with Batfish 5. Log validation results to S3 6. Notify user via Slack

25

  • All routers must use TACACS server 1.2.3.4
  • All routers must use NTP servers 1.2.3.4, 1.2.3.5
  • There must NOT be any undefined references
  • There must NOT be any unused structures
  • There must NOT be any filters with unreachable lines

DC Base Policy

  • All BGP sessions must be compatibly configured
  • All BGP sessions must be established
  • All host subnets on Leaf routers must be able to reach

all other host subnets on leaf routers

  • All Leaf routers must use a unique BGP ASN

DC Fabric Policy

slide-26
SLIDE 26

Scenario 1 recap

Batfish automatically determined that the ASN input for the first candidate change was incorrect The error would have been extremely difficult to find otherwise

  • All BGP sessions come up
  • All Spine routers have the correct routes

Batfish returned no errors for the second candidate change

  • That is proof that the change is correct and safe

26

slide-27
SLIDE 27

Scenario 2: Enable web service

27

  • Enable web-service on hosts

connected to lhr-leaf-03

  • Allow ANY IP (0.0.0.0/0) to reach

web servers (tcp:80) in subnet 10.1.5.0/27

border-01 fw-01 spine-01 leaf-01

10.1.1.0/24 10.1.2.0/24 10.1.3.0/24 10.1.4.0/24

LHR DC

border-02 fw-02 spine-02 leaf-02

10.1.5.0/24

leaf-03

slide-28
SLIDE 28

Scenario 2 pipeline

1. User input:

  • Target Firewalls
  • Updated Firewall policy
  • Firewall Change specification

2. Generate configuration using Jinja2 templates 3. Commit changes to git branch 4. Initiate change validation with Batfish 5. Log validation results to S3 6. Notify user via Slack

28

  • All routers must use TACACS server 1.2.3.4
  • All routers must use NTP servers 1.2.3.4, 1.2.3.5
  • There must NOT be any undefined references
  • There must NOT be any unused structures
  • There must NOT be any filters with unreachable lines

DC Base Policy

  • All BGP sessions must be compatibly configured
  • All BGP sessions must be established
  • All host subnets on Leaf routers must be able to reach

all other host subnets on leaf routers

  • All Leaf routers must use a unique BGP ASN

DC Fabric Policy

  • Ensure that change is necessary – change spec is not

already met with base config

  • Ensure that the spec is met by the candidate change
  • Ensure that there is no collateral damage – no flow
  • utside of the spec is impacted

Change Specific Test

slide-29
SLIDE 29

Scenario 2 recap

Batfish automatically determined that the firewall change in first candidate change was incorrect The error would have been extremely difficult to find

  • Change was partially correct, but permitted more traffic then intended
  • Only luck would find this in lab or production
  • Only visible with the correct packet(s)
  • Requires user to create negative test cases, not just affirmative ones

Batfish returned no errors for the second candidate change

  • That is proof that the change is correct and safe

29

slide-30
SLIDE 30

Summary

  • Automation is risky without effective validation
  • Batfish enables comprehensive, pre-deployment validation
  • Ensures that the network is always secure, reliable and compliant
  • Easy to embed within your automation pipeline

30

slide-31
SLIDE 31

Getting Started with Batfish

  • Jupyter Notebooks
  • https://www.github.com/batfish/pybatfish/tree/

master/jupyter_notebooks

  • Videos
  • https://www.youtube.com/channel/UCA-

OUW_3IOt9U_s60KvmJYA

  • Slack
  • Join Batfish Slack!

31

slide-32
SLIDE 32

Q & A

Batfish is useful, powerful, and it’s easy to get started. Please join us!

32

slide-33
SLIDE 33

33

Q / A

ansible-network@redhat.com github.com/network-automation facebook.com/ansibleautomation twitter.com/ansible