Click to edit Master title style SCALING NETWORK MONITORING IN A - - PowerPoint PPT Presentation

click to edit master title style
SMART_READER_LITE
LIVE PREVIEW

Click to edit Master title style SCALING NETWORK MONITORING IN A - - PowerPoint PPT Presentation

Click to edit Master title style SCALING NETWORK MONITORING IN A LARGE ENTERPRISE BroCon 2016 Austin, TX Click to edit Master title style Who am I? I work for Amazons Worldwide Consumer Information Security group What are we going to


slide-1
SLIDE 1

Click to edit Master title style

SCALING NETWORK MONITORING IN A LARGE ENTERPRISE

BroCon 2016 – Austin, TX

slide-2
SLIDE 2

Click to edit Master title style

I work for Amazon’s Worldwide Consumer Information Security group

Who am I?

slide-3
SLIDE 3

Click to edit Master title style

How we scaled our network monitoring solution while the network is continuously growing

What are we going to talk about?

slide-4
SLIDE 4

Click to edit Master title style

Understanding what is occurring on our corporate network is important to us

Why do we even do this?

slide-5
SLIDE 5

Click to edit Master title style In the beginning…

http://spaceflight.nasa.gov/gallery/images/station/crew-7/html/iss007e10807.html

slide-6
SLIDE 6

Click to edit Master title style

We originally decided on using vendor network sensors to get visibility in to what was occurring on our network

How do we approach this?

slide-7
SLIDE 7

Click to edit Master title style

  • Decided a vendor appliance was an effective

way of gathering the data we needed

  • We can buy network sensors, right?
  • So we bought network sensors and plugged

them into our network

How we started off

slide-8
SLIDE 8

Click to edit Master title style

Life was much simpler back then...

Vendor network sensor

  • 1Gb/s capable firewalls
  • SPAN sessions from our routers to vendor

network sensors

  • Small number of firewalls to monitor
  • We got layer 3 and layer 4 header

information from this network sensor

slide-9
SLIDE 9

Click to edit Master title style It looked something like this

Router Vendor appliance SPAN session The Internet Corporate network

Netflow export

Firewall Netflow collector Authorized users

slide-10
SLIDE 10

Click to edit Master title style What is a SPAN port?

http://www.cisco.com/c/en/us/support/docs/switches/catalyst-6500-series-switches/10570-41.html

slide-11
SLIDE 11

Click to edit Master title style Where do we go from here?

  • Our network traffic volume kept growing
  • Our sensor vendor stopped selling and

supporting the platform we were using

  • Vendor Management platform can’t scale
  • Driven by API usage by internal customers
  • Started getting close to the limit of network sensors

the management platform could handle

  • Increased internal maturity about using this data
slide-12
SLIDE 12

Click to edit Master title style

  • We have a vendor’s system we’re starting to

push the limits on

  • What features do we need?
  • Do we continue to buy or do we look at

building instead?

Future proofing?

slide-13
SLIDE 13

Click to edit Master title style Build vs Buy

Build Buy Speed of execution 

Control

 

Vendor support

 

Logistics

 

Performance

 

slide-14
SLIDE 14

Click to edit Master title style

  • My co-workers evaluated various options

Pushing for the next level

  • nProbe
  • Snort
  • Suricata
  • Bro
slide-15
SLIDE 15

Click to edit Master title style

  • Ran on a single host

Bro Generation One

  • Connected to our router via a 10G fiber link
  • SPAN session from the router to our Bro host
slide-16
SLIDE 16

Click to edit Master title style Bro Generation One looks like…

Router Bro SPAN session The Internet Corporate network

Netflow export

Firewall Netflow collector Authorized users Log store

slide-17
SLIDE 17

Click to edit Master title style

  • The Bro host was a single point of failure

The challenges of Generation One

  • Individual host installs have high
  • perational costs
  • High traffic volumes on our SPAN sessions

caused our router to reboot

  • Will this continue to scale with the growth
  • f our network?
slide-18
SLIDE 18

Click to edit Master title style Scorecard

Vendor solution Generation One Single point of failure?

 

Data collected via SPAN SPAN Control

 

Scalability

 

Logistics / Install effort

 

Cost per Gb/s

$$$ $

slide-19
SLIDE 19

Click to edit Master title style

Or so we thought….

And we are done!

slide-20
SLIDE 20

Click to edit Master title style

  • Seth spotted everything in the history field

was in upper case

– Turned out to be a trivial configuration change

  • We started off with 32GB of RAM in our

hosts and ended up upgrading to 128GB

Along came Seth…

slide-21
SLIDE 21

Click to edit Master title style

  • Capture loss levels (as reported by Bro)

started rising beyond acceptable levels once we were past 3Gb/s of traffic on our existing hardware platform

  • We knew that traffic levels were going to

continue to increase so our design needed to evolve as well

Scaling to infinity and beyond!

slide-22
SLIDE 22

Click to edit Master title style

  • We migrated to optical taps over SPAN

sessions

– SPAN sessions were good for speed of deployment but not for long term use

  • Introduced a method to allow us load

balance traffic among physical hosts

– Similar outcome to the work done by LBNL – Eliminated the SPOF with our Bro host

– https://commons.lbl.gov/download/attachments/120063098/100GIntrusionDetection.pdf

Introducing Bro Generation 1.5

slide-23
SLIDE 23

Click to edit Master title style

  • While we do run Bro in a cluster, it is limited

to a single physical host

  • We don’t want to share state across hosts
  • The Bro manager process being a single

point of failure isn’t all that appealing to us

  • Keep the hosts simple and consistent

Bro horizontal scaling

slide-24
SLIDE 24

Click to edit Master title style And here is how it looks

Bro host #1

The Internet

Corporate network

Load balancer Bro host #2 Bro host #3

Optical tap

Router

Netflow export

Netflow collector

Firewall

slide-25
SLIDE 25

Click to edit Master title style Scorecard

Vendor solution Generation One Generation 1.5 Single point of failure?

  

Data collected via SPAN SPAN Optical taps Control

  

Scalability

  

Logistics/ Install effort

  

Cost per Gb/s $$$

$ $

slide-26
SLIDE 26

Click to edit Master title style Optical taps overview

Optical tap

Load balancer Router Firewall TX RX RX TX TX 10Gb/s TX 10Gb/s

slide-27
SLIDE 27

Click to edit Master title style

  • This was a great step forward, but it was
  • nly an incremental improvement
  • We can now scale out but it is still time

consuming to get individual hosts deployed

  • Migrating to an integrated solution would

help solve these challenges

Still some work to do

slide-28
SLIDE 28

Click to edit Master title style

  • Combined our hosts, load balancers and
  • ptical taps into a “cookie cutter” rack

design

  • We now just order a small, medium or large

rack depending expected traffic volumes

Bro Generation 2.0

slide-29
SLIDE 29

Click to edit Master title style Bro Generation 2.0 physical layout

Bro host #1 Load balancer Bro host #2 Bro host #n Load balancer

Optical tap

Router Firewall Network rack Bro rack

slide-30
SLIDE 30

Click to edit Master title style Scaling Bro Generation 2.0 footprint

Optical tap

Router Firewall Network rack Bro host #1Bro host #2 Bro host #n Bro rack #1 Bro host #1Bro host #2 Bro host #n Bro rack #2 Load balancer Load balancer Load balancer

slide-31
SLIDE 31

Click to edit Master title style Scorecard

Vendor solution Generation One Generation 1.5 Generation 2 Single point

  • f failure?

   

Data collected via

SPAN SPAN Optical taps Optical taps

Control

   

Scalability

   

Logistics/ Install effort 

  

Cost per Gb/s $$$

$ $ $

slide-32
SLIDE 32

Click to edit Master title style

We stream the logs to our central log store

What do we do with all this data?

slide-33
SLIDE 33

Click to edit Master title style Central log storage

slide-34
SLIDE 34

Click to edit Master title style

Our original ETL jobs were based on the Bro 2.3 field order (output in TSV)

– Bro 2.4 changed the ordering of some of the fields – Use JSON if you’re loading this data elsewhere

  • One line configuration change!

Learn from some of our mistakes…

slide-35
SLIDE 35

Click to edit Master title style Wrapping up

http://www.nasa.gov/image-feature/sunset-from-the-international-space-station

slide-36
SLIDE 36

Click to edit Master title style

  • Scale horizontally and not vertically
  • Stateless sensors
  • Decouple dependencies
  • Plan up-front
  • Lab testing is never overrated
  • Get experts on-site to validate
  • Document wins
  • Know your customers

Lessons learnt

slide-37
SLIDE 37

Click to edit Master title style

Industry peers

– Thanks to LBNL, Mozilla and the others who responded to our queries and everyone who has publicly spoke or documented their install

Thanks to…

slide-38
SLIDE 38

Click to edit Master title style Thank you!