Parallelizing Network Analysis Robin Sommer Lawrence Berkeley - PowerPoint PPT Presentation

Parallelizing Network Analysis Robin Sommer Lawrence Berkeley National Laboratory & International Computer Science Institute robin@icir.org http://www.icir.org

Motivation • NIDSs have reached their limits on commodity hardware • Keep needing to do more analysis on more data at higher speeds • Analysis gets richer over timer, as attacks get more sophisticated • However, single CPU performance is not growing anymore the way it used to • Single NIDS instance (Snort, Bro) cannot cope with >=1Gbps links • Key to overcome current limits is parallel analysis • Volume is high but composed of many independent task • Need to exploit parallelism to cope with load 2

Orthogonal Approaches • The NIDS Cluster • Many PCs instead of one • Communication and central user interface creates the impression of one system • Vision: Parallel operation within a single NIDS instance • In software: multi-threaded analysis on multi-CPU/multi-core systems • In hardware: compile analysis into a parallel execution model (e.g., on FPGAs) 3

The NIDS Cluster 4

The NIDS Cluster • Load-balancing approach: use many boxes instead of one • Most NIDS provide support for multi-system setups • However they work independent in operational setups • Central manager collects alerts of independent NIDS instances • Aggregates results instead of correlating analysis • The NIDS cluster works transparently like a single NIDS • Gives same results as single NIDS would if it could analyze all traffic • No loss in detection accuracy • Scalable to large number of nodes • Single system for user interface (log aggregation, configuration changes) 5

Architecture !"# +$%&"$,-( +$%&"$&% $%&'(")) )&%.#"/ !"# !"#$%&$'()#'&* 6,$,7&" 222 3"#45 0,1/&$'()#'&* 6

Prototype Setups • Lawrence Berkeley National Laboratory • Monitors 10 Gbps upstream link • 1 front-end, 10 backends • University of California, Berkeley • Monitors 2x1Gbps upstream links • 2 front-ends, 6 backends • IEEE Supercomputing 2006 • Conference’s 1 Gbps backbone network • 100 Gbps High Speed Bandwidth Challenge network (partially) • Goal: Replace current operational security monitoring 7

Front-Ends • Distribute traffic to back-ends by rewriting MACs • In software via Click • In hardware via Force-10’s P10 (prototype in collaboration with F10) • Fault-tolerance • Easy to retarget traffic if a back-end node fails • Per connection-hashing • Either 4-tuple (addrs,ports) or 2-tuple (addrs) • MD5 mod n, ADD mod n 8

Simulation of Hashing Schemes ;8 -&9 ! = -&9 ! ; 3.( ! = 3.( ! =%5::%$,&"+7 !"#$%&'((")"$*"+%*,-.#)"&%/'01%"2"$%&'+0)'340',$%567 :9 :8 9 8 !,$%:8<88 !,$%:=<88 !,$%:><88 !,$%;;<88 ?4"%;<88 ?4"%@<88 9

Back-ends • Running Bro as their analysis engine • Bro provides extensive communication facilities • Independent state framework • Sharing of low-level state • Script-layer variables can be synchronized • Basic approach: pick state to be synchronized • A few subtleties needed to be solved • Central manager • Collects output of all instances • Raises alerts • Provides dynamic reconfiguration facilities 10

Evaluation & Outlook • Prototypes are running nicely • Are able to perform analysis not possible before • E.g., full HTTP analysis & Dynamic Protocol Detection/Analysis • Now in the process of making it production quality • Evaluation • Verified accuracy by comparing against single Bro instance • Evaluated performance wrt load-balancing quality, scalability, overhead 11

CPU Load per Node node0 node1 15 node2 node3 node4 node5 node6 node7 node8 Probability density node9 10 5 0 0.0 0.1 0.2 0.3 0.4 0.5 CPU utilization 12

Scaling of CPU 10 nodes 5 nodes 25 3 nodes 20 Probability density 15 10 5 0 0.0 0.1 0.2 0.3 0.4 0.5 CPU utilization 13

Load on Berkeley Campus 70 Backend 0 Backend 2 Backend 4 Proxy 0 Manager Backend 1 Backend 3 Backend 5 Proxy 1 60 50 40 CPU load (%) 30 20 10 0 Tue 12:00 Tue 18:00 Wed 0:00 Wed 6:00 Wed 12:00 Wed 18:00 Thu 0:00 Thu 6:00 14

Parallelizing Analysis 15

Potential • Observation • Much of the processing of a typical NIDS instance can be done in parallel • However, existing systems do not exploit the potential • Example: Bro NIDS • Assume Gbps network with 10,000 concurrent connections Packet Assembled Event Filtered Aggregated TCP Stream Reassembly Streams Packet Streams Event Event Aggregate Analysis Protocol Analyzers Per Flow Analysis Streams Streams Streams Global Analysis Stream Demux 1-10 Gbps ~10 4 ~10 5 ~10 4 ~10 3 ~10-100 Instances Instances Instances Instances Instances 16

Commodity Hardware • Multi-thread/multi-core CPU provide necessary power • Inexpensive commodity hardware • Aggregated throughput does in fact still follow Moore’s law • Need to structure applications in highly parallel fashion • Do not get the performance gain out of the box • Need to tructure processing into separate low-level threads • Need to address • Intrusion prevention functionality • Exchange of state between threads for global analysis • Yet minimize inter-thread communication • Factor in memory locality (within one core / across several cores) • Provide performance debugging tools 17

Proposed Architecture CPU Core 1 CPU Core 2 ... L1 D-Cache L1 D-Cache Thread Thread Thread Thread Thread Thread Thread Thread Cached Cached Queues Queues L2 Cache & Main Memory Core 1 MSG-Event-Q ... Core 1 Pkt-Q Core 2 Pkt-Q Core 2 MSG-Event-Q ... Core 1 Event-Q Core 2 Event-Q External MSG-Event-Q Active Network Conn Table Packet Interface Host Table Dispatch Pending Pkts 18

Active Network Interface • Only non-commodity components currently • Prototype to be based on NetFPGA platform ($2000) • Commodity hardware might actually be suitable later (E.g., Sun’s Niagara 2 has 8 CPU cores plus 2 directly attached 10GE controller!) • Thread-aware Routing • ANI copies packet directly into thread’s memory (cache) • ANI keeps per-flow table of routing decisions • Dispatcher thread takes initial routing decision per flow • Selective packet forwarding • ANI holds packets until it gets the clearance (might use caching per e.g. flow/ip) • Normalization 19

Parallelized Network Analysis • Architecturally-aware Threading • Need to identify the right granularity for threads • Protocol analysis consists of fixed blocks of functionality • Event processing needs to preserve temporal order � Multiple independent event queues (e.g., one per core) • Scalable Inter-thread Communication • Can use shared memory • Need to consider nonuniformities in system’s cache hierarchy • Potentially restructure detection algorithms to minimize communication (e.g., loosing semantics via probabilistic algorithms) • Prevention Functionality • Only forward packet once all events are processed • Evaluation, profiling & debugging • Race conditions & memory access patterns • Trace-based reproducability 20

Going Further: Custom Hardware • Goal: custom platform for highly parallel, stateful network analysis • Custom hardware (e.g., FPGAs) is ideal for parallel tasks • Expose the parallelism and map it to hardware • We can identify three types of functionality in Bro • Fixed function blocks � Handcraft (e.g., robust reassembly) • Protocol analyzers � Use BinPAC with new backend • Policy scripts � Compile into parallell computation model • Envision using MIT’s Transactor model • Many small self-contained units communicating via message queues • Ambitious but highly promising • Generic network analysis beyond network intrusion detection 21

Thanks for your attention. Robin Sommer Lawrence Berkeley National Laboratory & International Computer Science Institute robin@icir.org http://www.icir.org This work is supported by the Office of Science and Technology at the Department of Homeland Security. Points of view in this document are those of the author(s) and do not necessarily represent the official position of the U.S. Department of Homeland Security or the Office of Science and Technology.

Parallelizing Network Analysis Robin Sommer Lawrence Berkeley - PowerPoint PPT Presentation

Parallelizing Network Analysis Robin Sommer Lawrence Berkeley National Laboratory & International Computer Science Institute robin@icir.org http://www.icir.org Motivation NIDSs have reached their limits on commodity hardware Keep

Protein Clustering: Parallelizing an Expensive, Irregular Computation James Larus EPFL AACBB

Parallelizing SCIP-SDP via the UG framework Tristan Gally joint work with Marc E. Pfetsch,

Parallelizing an Interactive Theorem Prover Functional Programming and Proofs with ACL2 David L.

PdP: Parallelizing Data Plane in Virtual Network Substrate Yong Liao, Dong Yin, Lixin Gao

CCTBX tools: I. Parallelizing Python code II. Analysis of unmerged intensities Nathaniel Echols

Parallelizing Union-Find in Constraint Handling Rules Using Confluence Analysis Thom Fr

Parallelizing DNA Read Mapping Sunny Nahar What is DNA Sequencing? Finding the base-pairs for

Parallelizing Machine Learning- Functionally A F RAMEWORK and A BSTRACTIONS for Parallel Graph

On Parallelizing Advection and Navier- Stokes Simulators An Introspection Project Goals To

Why Parallelize? Why Parallelize? To decrease the overall computation time of a job. To

Parallelizing Equation-Based Models for Simulation on Multi-Core Platforms by Utilizing Model

Parallelizing Semi- ReDAS Lab Supervised Learning Algorithms with MapReduce Nick Gauthier

Parallelizing TTree::Draw functionality with PROOF Stefano Marinaci (Supervisor: Gerardo Ganis)

Towards Parallelizing Legacy Embedded Control Software Using the LET Programming Paradigm Julien

OPPORTUNITIES AND CHALLENGES OF PARALLELIZING SPEECH RECOGNITION Jike Chong, Gerald Friedland,

Parallelizing the Spot Model for Dense Granular Flow 18.337 Parallel Computing Yee Lok Wong

FEM Theory: Variational Approach Energy Minimization Another approach: Weighted Residual

Interfacing neBEM and Garfield++ RD51 collaboration meeting, 24 June 2020 1/8 1 / 8 neBEM BEM

Labs start this week Do your pre-lab assignments Tutorials start this week Quiz #1

E E z = E cos = E z E = k Q 2 Z k Q Z z = kQz

Parallel & Distributed Real-Time Systems This weeks schedule: Only one lecture

On Benefits of Network Coding in Bidirected Networks and Hyper-networks Zongpeng Li University

Digital Circuits and Systems Spring 2015 Week 1 Module 2 Shankar Balachandran* Associate

Design of the Field Cage and Electrical Design of the Field Cage and Electrical components for

Parallelizing Network Analysis Robin Sommer Lawrence Berkeley - PowerPoint PPT Presentation

Parallelizing Network Analysis Robin Sommer Lawrence Berkeley National Laboratory & International Computer Science Institute robin@icir.org http://www.icir.org Motivation NIDSs have reached their limits on commodity hardware Keep

Protein Clustering: Parallelizing an Expensive, Irregular Computation James Larus EPFL AACBB

Parallelizing SCIP-SDP via the UG framework Tristan Gally joint work with Marc E. Pfetsch,

Parallelizing an Interactive Theorem Prover Functional Programming and Proofs with ACL2 David L.

PdP: Parallelizing Data Plane in Virtual Network Substrate Yong Liao, Dong Yin, Lixin Gao

CCTBX tools: I. Parallelizing Python code II. Analysis of unmerged intensities Nathaniel Echols

Parallelizing Union-Find in Constraint Handling Rules Using Confluence Analysis Thom Fr

Parallelizing DNA Read Mapping Sunny Nahar What is DNA Sequencing? Finding the base-pairs for

Parallelizing Machine Learning- Functionally A F RAMEWORK and A BSTRACTIONS for Parallel Graph

On Parallelizing Advection and Navier- Stokes Simulators An Introspection Project Goals To

Why Parallelize? Why Parallelize? To decrease the overall computation time of a job. To

Parallelizing Equation-Based Models for Simulation on Multi-Core Platforms by Utilizing Model

Parallelizing Semi- ReDAS Lab Supervised Learning Algorithms with MapReduce Nick Gauthier

Parallelizing TTree::Draw functionality with PROOF Stefano Marinaci (Supervisor: Gerardo Ganis)

Towards Parallelizing Legacy Embedded Control Software Using the LET Programming Paradigm Julien

OPPORTUNITIES AND CHALLENGES OF PARALLELIZING SPEECH RECOGNITION Jike Chong, Gerald Friedland,

Parallelizing the Spot Model for Dense Granular Flow 18.337 Parallel Computing Yee Lok Wong

FEM Theory: Variational Approach Energy Minimization Another approach: Weighted Residual

Interfacing neBEM and Garfield++ RD51 collaboration meeting, 24 June 2020 1/8 1 / 8 neBEM BEM

Labs start this week Do your pre-lab assignments Tutorials start this week Quiz #1

E E z = E cos = E z E = k Q 2 Z k Q Z z = kQz

Parallel &amp; Distributed Real-Time Systems This weeks schedule: Only one lecture

On Benefits of Network Coding in Bidirected Networks and Hyper-networks Zongpeng Li University

Digital Circuits and Systems Spring 2015 Week 1 Module 2 Shankar Balachandran* Associate

Design of the Field Cage and Electrical Design of the Field Cage and Electrical components for

Parallel & Distributed Real-Time Systems This weeks schedule: Only one lecture