Network Traffic Monitoring & Analysis with GPUs Wenji - PowerPoint PPT Presentation

Network ¡Traffic ¡Monitoring ¡& ¡ Analysis ¡with ¡GPUs ¡ Wenji ¡Wu, ¡Phil ¡DeMar ¡ wenji@fnal.gov, ¡demar@fnal.gov ¡ ¡ GPU ¡Technology ¡Conference ¡2013 ¡ March ¡18-‑21, ¡2013 ¡SAN ¡JOSE, ¡CALIFORNIA ¡ ¡

Background ¡ • Main ¡uses ¡for ¡network ¡traffic ¡monitoring ¡& ¡analysis ¡tools: ¡ – OperaWons ¡& ¡management ¡ – Capacity ¡planning ¡ – Performance ¡troubleshooWng ¡ • Levels ¡of ¡network ¡traffic ¡monitoring ¡& ¡analysis: ¡ – Device ¡counter ¡level ¡(snmp ¡data) ¡ – Traffic ¡flow ¡level ¡(flow ¡data) ¡ – At ¡the ¡packet ¡inspecWon ¡level ¡(The ¡Focus ¡of ¡this ¡work) ¡ • security ¡analysis ¡ • applicaWon ¡performance ¡analysis ¡ • traffic ¡characterizaWon ¡studies ¡ 2 ¡

Background ¡ (cont.) ¡ CharacterisWcs ¡of ¡packet-‑based ¡network ¡monitoring ¡& ¡ analysis ¡applicaWons ¡ ¡ • Time ¡constraints ¡on ¡packet ¡processing. ¡ • Compute ¡and ¡I/O ¡throughput-‑intensive ¡ • High ¡levels ¡of ¡data ¡parallelism. ¡ ¡ – Each ¡packet ¡can ¡be ¡processed ¡independently ¡ • Extremely ¡poor ¡temporal ¡locality ¡for ¡data ¡ ¡ – Typically, ¡data ¡processed ¡once ¡in ¡sequence; ¡ ¡rarely ¡reused ¡ 3 ¡

The ¡Problem ¡ Packet-‑based ¡traffic ¡monitoring ¡& ¡analysis ¡tools ¡face ¡ performance ¡& ¡scalability ¡challenges ¡within ¡high-‑ performance ¡networks. ¡ – High-‑performance ¡networks: ¡ • 40GE/100GE ¡link ¡technologies ¡ • Servers ¡are ¡10GE-‑connected ¡by ¡default ¡ • ¡400GE ¡backbone ¡links ¡& ¡40GE ¡host ¡connecWons ¡loom ¡on ¡the ¡ horizon. ¡ – Millions ¡of ¡packets ¡generated ¡& ¡transmiced ¡per ¡sec ¡ ¡ 4 ¡

Monitoring ¡& ¡Analysis ¡Tool ¡Pladorms ¡(I) ¡ • Requirements ¡on ¡compuWng ¡pladorm ¡for ¡high ¡ performance ¡network ¡monitoring ¡& ¡analysis ¡ applicaWons: ¡ – High ¡Compute ¡power ¡ – Ample ¡memory ¡bandwidth ¡ – Capability ¡of ¡handing ¡data ¡parallelism ¡inherent ¡with ¡ network ¡data ¡ – Easy ¡programmability ¡ 5 ¡

Monitoring ¡& ¡Analysis ¡Tool ¡Pladorms ¡(II) ¡ • Three ¡types ¡of ¡compuWng ¡pladorms: ¡ – NPU/ASIC ¡ – CPU ¡ – GPU ¡ Features ¡ NPU/ASIC ¡ CPU ¡ GPU ¡ Varies ¡ ✖ ¡ ✔ ¡ High ¡compute ¡power ¡ High ¡memory ¡bandwidth ¡ Varies ¡ ✖ ¡ ✔ ¡ Easy ¡programmability ¡ ✖ ¡ ✔ ¡ ✔ ¡ Data-‑parallel ¡execuEon ¡model ¡ ✖ ¡ ✖ ¡ ✔ ¡ Architecture ¡Comparison ¡ 6 ¡

Our ¡SoluWon ¡ Use ¡GPU-‑based ¡Traffic ¡Monitoring ¡& ¡Analysis ¡Tools ¡ Highlights ¡of ¡our ¡work: ¡ • Demonstrated ¡GPUs ¡can ¡significantly ¡accelerate ¡network ¡ traffic ¡monitoring ¡& ¡analysis ¡ – 11 ¡million+ ¡pkts/s ¡without ¡drops ¡(single ¡Nvidia ¡M2070) ¡ • Designed/implemented ¡a ¡generic ¡I/O ¡architecture ¡to ¡ move ¡network ¡traffic ¡from ¡wire ¡into ¡GPU ¡domain ¡ • Implemented ¡a ¡GPU-‑accelerated ¡library ¡for ¡network ¡ traffic ¡capturing, ¡monitoring, ¡and ¡analysis. ¡ ¡ – Dozens ¡of ¡CUDA ¡kernels, ¡which ¡can ¡be ¡combined ¡in ¡a ¡variety ¡of ¡ ways ¡to ¡perform ¡monitoring ¡and ¡analysis ¡tasks ¡ ¡ 7 ¡

Key ¡Technical ¡Issues ¡ • GPU’s ¡relaWvely ¡small ¡memory ¡size: ¡ – Nvidia ¡M2070 ¡has ¡6 ¡GB ¡Memory ¡ – Workarounds: ¡ • Mapping ¡host ¡memory ¡into ¡GPU ¡with ¡zero-‑copy ¡ technique? ¡ ¡ • ParWal ¡packet ¡capture ¡approach ¡ ✔ ¡ • Need ¡to ¡capture ¡& ¡move ¡packets ¡from ¡wire ¡into ¡ GPU ¡domain ¡without ¡packet ¡loss ¡ • Need ¡to ¡design ¡data ¡structures ¡that ¡are ¡efficient ¡ for ¡both ¡CPU ¡and ¡GPU ¡ 8 ¡

System ¡Architecture ¡ Four ¡Types ¡of ¡Logical ¡EnWWes: ¡ ¡ • Traffic ¡Capture ¡ • Monitoring ¡& ¡Analysis ¡ • Preprocessing ¡ • Output ¡Display ¡ 3. Monitoring & Analysis 1. Traffic Capture 2. Preprocessing 4. Output Display GPU Domain Captured Packet Packet Output Output ... Buffer Buffer Data Output Output Monitoring & Analysis Kernels Capturing Packet Chunks User Space ... NICs Network Packets 9 ¡

Packet ¡I/O ¡Engine ¡ Processing Data Key ¡techniques ¡ ¡ Recycle • Pre-‑allocated ¡large ¡packet ¡buffers ¡ Capture • Packet-‑level ¡batch ¡processing ¡ User Space • Memory ¡mapping ¡based ¡zero-‑copy ¡ OS Kernel Packet Buffer Chunk Attach ... ... Key ¡OperaWons ¡ Free Packet Buffer Chunks Descriptor Segments ¡ Recv Descriptor Ring • Open ¡ NIC Incoming Packets • Capture ¡ • Recycle ¡ • Close ¡ 10 ¡

GPU-‑based ¡Network ¡Traffic ¡Monitoring ¡& ¡ Analysis ¡Algorithms ¡ • A ¡GPU-‑accelerated ¡library ¡for ¡network ¡traffic ¡ capturing, ¡monitoring, ¡and ¡analysis ¡apps. ¡ – Dozens ¡of ¡CUDA ¡kernels ¡ – Can ¡be ¡combined ¡in ¡a ¡variety ¡of ¡ways ¡to ¡perform ¡ intended ¡monitoring ¡& ¡analysis ¡operaWons ¡ 11 ¡

Packet-‑Filtering ¡Kernel ¡ index 0 1 2 3 4 5 6 7 raw_pkts [ ] p1 p2 p4 p5 p7 p8 p3 p6 We ¡use ¡Berkeley ¡Packet ¡Filter ¡(BPF) ¡ 1 x x x x Filtering as ¡the ¡packet ¡filter ¡ filtering_buf [ ] 1 0 1 1 0 0 1 0 2 Scan scan_buf [ ] 0 1 1 2 3 3 3 4 A ¡few ¡basic ¡GPU ¡operaWons, ¡such ¡ index 3 0 1 2 3 Compact as ¡sort, ¡prefix-‑sum, ¡and ¡compact. ¡ filtered_pkts [ ] p1 p3 p4 p7 index 0 1 2 3 4 5 6 7 Advanced ¡packet ¡filtering ¡capabiliWes ¡at ¡wire ¡speed ¡are ¡necessary ¡ so ¡that ¡we ¡only ¡analyze ¡those ¡packets ¡of ¡interest ¡to ¡us. ¡ ¡ 12 ¡

Traffic-‑AggregaWon ¡Kernel ¡ Reads ¡an ¡array ¡of ¡ n ¡packets ¡at ¡ pkts [] ¡and ¡aggregates ¡traffic ¡between ¡same ¡src ¡& ¡dst ¡IP ¡ addresses. ¡ ¡Exports ¡a ¡list ¡of ¡entries; ¡each ¡entry ¡records ¡a ¡src ¡& ¡dst ¡IP ¡address ¡pair, ¡with ¡ associated ¡traffic ¡staWsWcs ¡such ¡as ¡packets ¡and ¡bytes ¡sent ¡etc. ¡ ¡ index 0 1 2 3 4 5 6 7 raw_pkts[ ] 0 1 2 3 4 5 6 7 value key_value[ ] key1 src1 src1 src2 src1 src1 src1 src2 src2 key2 dst1 dst2 dst4 dst1 dst1 dst3 dst4 dst4 value 0 3 4 1 5 2 6 7 MulWkey-‑Value ¡Sort ¡ sorted_pkts[ ] src1 src1 src1 src1 src1 src2 src2 src2 key1 key2 dst1 dst1 dst1 dst2 dst3 dst4 dst4 dst4 diff_buf[ ] 1 0 0 1 1 1 0 0 Inclusive ¡Scan ¡ inc_scan_buf[ ] 1 1 1 2 3 4 4 4 src1 src1 src1 src2 IP_Traffic [ ] dst1 dst2 dst3 dst4 Use ¡to ¡build ¡IP ¡conversaWons ¡ stats stats stats stats index 0 1 2 3 13 ¡

Unique-‑IP-‑Addresses ¡Kernel ¡ It ¡reads ¡an ¡array ¡of ¡ n ¡packets ¡at ¡ pkts [] ¡ ¡and ¡outputs ¡a ¡list ¡of ¡ unique ¡src ¡or ¡dst ¡IP ¡addresses ¡seen ¡on ¡the ¡packets. ¡ ¡ 1. ¡for ¡each ¡ i ∈ [0,n-‑1] ¡in ¡parallel ¡ do ¡ ¡ ¡ IPs [ i ] ¡ ≔ ¡src ¡or ¡dst ¡addr ¡of ¡pkts [ i ]; ¡ ¡ ¡end ¡for ¡ 2. ¡ ¡perform ¡sort ¡on ¡ IPs [] ¡ to ¡determine ¡ sorted_IPs []; ¡ 3 . ¡ ¡diff_results [ 0 ] ¡ =1; ¡ ¡for ¡each ¡ i ∈ [1,n-‑1] ¡in ¡parallel ¡ do ¡ ¡ ¡ if (sorted_IPs[i] ¡≠sorted_IPs[i-‑1]) ¡ diff_buf[i]=1 ; ¡ ¡ ¡ else ¡ diff_buf[i]=0 ; ¡ ¡end ¡for ¡ 4. ¡perform ¡exclusive ¡prefix ¡sum ¡on ¡ diff_buf []; ¡ 5. ¡for ¡each ¡ i ∈ [0,n-‑1] ¡in ¡parallel ¡ do ¡ ¡ ¡ if ( diff_buf[i] ¡==1 ) ¡Output[scan_buf[i]]=sorted_IPs[i] ; ¡ ¡end ¡for ¡ 14 ¡

Network Traffic Monitoring & Analysis with GPUs Wenji - PowerPoint PPT Presentation

Network Traffic Monitoring & Analysis with GPUs Wenji Wu, Phil DeMar wenji@fnal.gov, demar@fnal.gov GPU Technology Conference 2013 March 18-21, 2013

Why use GPUs for graph processing? FOSDEM 2020 2 GPUs and Graphs Graphs GPUs Found

using Traffic Analysis Attacks Salini S K What is Traffic Analysis What is Traffic Analysis

Traffic Shaping, Traffic Policing Peter Puschner, Institut fr Technische Informatik Traffic

Traffic signal optimization and traffic assignment Traffic signals Traffic signal optimization

The Traffic Conflicts Methodology revisited Richard van der Horst Traffic Safety Assessment

Traffic Engineering with Traffic Engineering with Estimated Traffic Matrices Estimated Traffic

The Traffic Monitoring Portal Site The Traffic Monitoring Portal Site Jungu Kang Jungu Kang

Traffic lights for remote devices monitoring Viola Patrol Application for remote monitoring What

VoIP/SMPP traffic sniffer Break through your data Traffic sniffer modules VoIP traffic sniffer

Pinson and Arkansas Blvd. Traffic Count Legend Traffic Count Map Pinson and Ark Blvd Ordinance:

Broward County Traffic Engineering Programs Broward County Traffic Engineering Programs

Traffic Flow Models CIVL 4162/6162 (Traffic Engineering) Lesson Objective Demonstrate

Scott Le Grand Some Things Never Change (GPUs vs the World) How Best to Exploit GPUs

Unleashing the Power of GPUs over the Web Vishal Vaidyanathan Royal Caliber LLC GPUs are

Efficient Video Decoding on GPUs Efficient Video Decoding on GPUs by Point Based Rendering by

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

bioCADDIE Data Citation Implementation Pilot (DCIP) Status Report Tim Clark, PhD Harvard

Status of Packaging HEP Software using Spack Patrick Gartung Scientific Software Infrastructure

Investor Presentation March 2017 Highly leveraged oil producer and explorer DISCLAIMER AND

Strategies for Integration of IPE into Didactic and Clinical Education Anthony Breitbach PhD, ATC

NKOS Workshop 2019 OSLO Marjorie M. K. Hlava, President Access Innovations, Inc.

CONTAINERS DEMOCRATIZE HPC CJ Newburn, Principal Architect for HPC, NVIDIA GTC19 S9525 -

What is the Sacred Vocation Program? A personal transformation program that is based on the

Recommendations for the Dissemination of ME/CFS Medical Education Chronic Fatigue Syndrome