Parallelizing Network Analysis Robin Sommer Lawrence Berkeley - - PowerPoint PPT Presentation

parallelizing network analysis
SMART_READER_LITE
LIVE PREVIEW

Parallelizing Network Analysis Robin Sommer Lawrence Berkeley - - PowerPoint PPT Presentation

Parallelizing Network Analysis Robin Sommer Lawrence Berkeley National Laboratory & International Computer Science Institute robin@icir.org http://www.icir.org Motivation NIDSs have reached their limits on commodity hardware Keep


slide-1
SLIDE 1

Robin Sommer

Lawrence Berkeley National Laboratory & International Computer Science Institute

robin@icir.org http://www.icir.org

Parallelizing Network Analysis

slide-2
SLIDE 2

Motivation

  • NIDSs have reached their limits on commodity hardware
  • Keep needing to do more analysis on more data at higher speeds
  • Analysis gets richer over timer, as attacks get more sophisticated
  • However, single CPU performance is not growing anymore the way it used to
  • Single NIDS instance (Snort, Bro) cannot cope with >=1Gbps links
  • Key to overcome current limits is parallel analysis
  • Volume is high but composed of many independent task
  • Need to exploit parallelism to cope with load

2

slide-3
SLIDE 3

Orthogonal Approaches

  • The NIDS Cluster
  • Many PCs instead of one
  • Communication and central user interface creates the impression of one system
  • Vision: Parallel operation within a single NIDS instance
  • In software: multi-threaded analysis on multi-CPU/multi-core systems
  • In hardware: compile analysis into a parallel execution model (e.g., on FPGAs)

3

slide-4
SLIDE 4

The NIDS Cluster

4

slide-5
SLIDE 5

The NIDS Cluster

  • Load-balancing approach: use many boxes instead of one
  • Most NIDS provide support for multi-system setups
  • However they work independent in operational setups
  • Central manager collects alerts of independent NIDS instances
  • Aggregates results instead of correlating analysis
  • The NIDS cluster works transparently like a single NIDS
  • Gives same results as single NIDS would if it could analyze all traffic
  • No loss in detection accuracy
  • Scalable to large number of nodes
  • Single system for user interface (log aggregation, configuration changes)

5

slide-6
SLIDE 6

Architecture

6

!"#$%&$'()#'&* +$%&"$&% !"# +$%&"$,-( )&%.#"/ $%&'(")) 0,1/&$'()#'&*

222

3"#45 6,$,7&" !"#

slide-7
SLIDE 7

Prototype Setups

  • Lawrence Berkeley National Laboratory
  • Monitors 10 Gbps upstream link
  • 1 front-end, 10 backends
  • University of California, Berkeley
  • Monitors 2x1Gbps upstream links
  • 2 front-ends, 6 backends
  • IEEE Supercomputing 2006
  • Conference’s 1 Gbps backbone network
  • 100 Gbps High Speed Bandwidth Challenge network (partially)
  • Goal: Replace current operational security monitoring

7

slide-8
SLIDE 8

Front-Ends

  • Distribute traffic to back-ends by rewriting MACs
  • In software via Click
  • In hardware via Force-10’s P10 (prototype in collaboration with F10)
  • Fault-tolerance
  • Easy to retarget traffic if a back-end node fails
  • Per connection-hashing
  • Either 4-tuple (addrs,ports) or 2-tuple (addrs)
  • MD5 mod n, ADD mod n

8

slide-9
SLIDE 9

Simulation of Hashing Schemes

9

!"#$%&'((")"$*"+%*,-.#)"&%/'01%"2"$%&'+0)'340',$%567 8 9 :8 :9 ;8 !,$%:8<88 !,$%:=<88 !,$%:><88 !,$%;;<88 ?4"%;<88 ?4"%@<88

  • &9!=
  • &9!;

3.(!= 3.(!=%5::%$,&"+7

slide-10
SLIDE 10

Back-ends

  • Running Bro as their analysis engine
  • Bro provides extensive communication facilities
  • Independent state framework
  • Sharing of low-level state
  • Script-layer variables can be synchronized
  • Basic approach: pick state to be synchronized
  • A few subtleties needed to be solved
  • Central manager
  • Collects output of all instances
  • Raises alerts
  • Provides dynamic reconfiguration facilities

10

slide-11
SLIDE 11

Evaluation & Outlook

  • Prototypes are running nicely
  • Are able to perform analysis not possible before
  • E.g., full HTTP analysis & Dynamic Protocol Detection/Analysis
  • Now in the process of making it production quality
  • Evaluation
  • Verified accuracy by comparing against single Bro instance
  • Evaluated performance wrt load-balancing quality, scalability, overhead

11

slide-12
SLIDE 12

CPU Load per Node

12

0.0 0.1 0.2 0.3 0.4 0.5 5 10 15

CPU utilization Probability density

node0 node1 node2 node3 node4 node5 node6 node7 node8 node9

slide-13
SLIDE 13

Scaling of CPU

13

0.0 0.1 0.2 0.3 0.4 0.5 5 10 15 20 25

CPU utilization Probability density

10 nodes 5 nodes 3 nodes

slide-14
SLIDE 14

Load on Berkeley Campus

14

CPU load (%) 10 20 30 40 50 60 70 Tue 12:00 Tue 18:00 Wed 0:00 Wed 6:00 Wed 12:00 Wed 18:00 Thu 0:00 Thu 6:00 Backend 0 Backend 1 Backend 2 Backend 3 Backend 4 Backend 5 Proxy 0 Proxy 1 Manager

slide-15
SLIDE 15

Parallelizing Analysis

15

slide-16
SLIDE 16

Potential

16

  • Observation
  • Much of the processing of a typical NIDS instance can be done in parallel
  • However, existing systems do not exploit the potential
  • Example: Bro NIDS
  • Assume Gbps network with 10,000 concurrent connections

1-10 Gbps

Stream Demux TCP Stream Reassembly ~104 Instances Protocol Analyzers ~105 Instances Per Flow Analysis ~104 Instances Aggregate Analysis ~103 Instances Global Analysis ~10-100 Instances Packet Streams Assembled Packet Streams Event Streams Filtered Event Streams Aggregated Event Streams

slide-17
SLIDE 17

Commodity Hardware

  • Multi-thread/multi-core CPU provide necessary power
  • Inexpensive commodity hardware
  • Aggregated throughput does in fact still follow Moore’s law
  • Need to structure applications in highly parallel fashion
  • Do not get the performance gain out of the box
  • Need to tructure processing into separate low-level threads
  • Need to address
  • Intrusion prevention functionality
  • Exchange of state between threads for global analysis
  • Yet minimize inter-thread communication
  • Factor in memory locality (within one core / across several cores)
  • Provide performance debugging tools

17

slide-18
SLIDE 18

Proposed Architecture

18

Active Network Interface CPU Core 1

Thread Thread Thread Thread L1 D-Cache Cached Queues

L2 Cache & Main Memory

Core 1 Pkt-Q Core 1 Event-Q Conn Table Host Table Core 2 Pkt-Q Core 2 Event-Q

...

Pending Pkts

...

Core 2 MSG-Event-Q Core 1 MSG-Event-Q

...

Packet Dispatch

CPU Core 2

Thread Thread Thread Thread L1 D-Cache Cached Queues External MSG-Event-Q

slide-19
SLIDE 19

Active Network Interface

  • Only non-commodity components currently
  • Prototype to be based on NetFPGA platform ($2000)
  • Commodity hardware might actually be suitable later

(E.g., Sun’s Niagara 2 has 8 CPU cores plus 2 directly attached 10GE controller!)

  • Thread-aware Routing
  • ANI copies packet directly into thread’s memory (cache)
  • ANI keeps per-flow table of routing decisions
  • Dispatcher thread takes initial routing decision per flow
  • Selective packet forwarding
  • ANI holds packets until it gets the clearance (might use caching per e.g. flow/ip)
  • Normalization

19

slide-20
SLIDE 20

Parallelized Network Analysis

  • Architecturally-aware Threading
  • Need to identify the right granularity for threads
  • Protocol analysis consists of fixed blocks of functionality
  • Event processing needs to preserve temporal order

Multiple independent event queues (e.g., one per core)

  • Scalable Inter-thread Communication
  • Can use shared memory
  • Need to consider nonuniformities in system’s cache hierarchy
  • Potentially restructure detection algorithms to minimize communication

(e.g., loosing semantics via probabilistic algorithms)

  • Prevention Functionality
  • Only forward packet once all events are processed
  • Evaluation, profiling & debugging
  • Race conditions & memory access patterns
  • Trace-based reproducability

20

slide-21
SLIDE 21

Going Further: Custom Hardware

  • Goal: custom platform for highly parallel, stateful network analysis
  • Custom hardware (e.g., FPGAs) is ideal for parallel tasks
  • Expose the parallelism and map it to hardware
  • We can identify three types of functionality in Bro
  • Fixed function blocks Handcraft (e.g., robust reassembly)
  • Protocol analyzers Use BinPAC with new backend
  • Policy scripts Compile into parallell computation model
  • Envision using MIT’s Transactor model
  • Many small self-contained units communicating via message queues
  • Ambitious but highly promising
  • Generic network analysis beyond network intrusion detection

21

slide-22
SLIDE 22

Robin Sommer

Lawrence Berkeley National Laboratory & International Computer Science Institute

robin@icir.org http://www.icir.org

This work is supported by the Office of Science and Technology at the Department of Homeland Security. Points of view in this document are those of the author(s) and do not necessarily represent the

  • fficial position of the U.S. Department of Homeland Security or the

Office of Science and Technology.

Thanks for your attention.