Less is More with Intelligent Packet Capture RANDY CALDEJON FLOCON - - PowerPoint PPT Presentation

less is more with intelligent packet capture
SMART_READER_LITE
LIVE PREVIEW

Less is More with Intelligent Packet Capture RANDY CALDEJON FLOCON - - PowerPoint PPT Presentation

Less is More with Intelligent Packet Capture RANDY CALDEJON FLOCON 2020 Objectives Consider merits of streaming analytics Expose to advanced open source tools Encourage to experiment with OpenArgus 2 2 Streaming Analytics


slide-1
SLIDE 1

Less is More with Intelligent Packet Capture

RANDY CALDEJON

FLOCON 2020

slide-2
SLIDE 2

2

2

Objectives

  • Consider merits of streaming analytics
  • Expose to advanced open source tools
  • Encourage to experiment with OpenArgus
slide-3
SLIDE 3

3

3

  • Increase speed
  • Reduce bandwidth
  • Local Resources

Streaming Analytics at the Edge

slide-4
SLIDE 4

4

DragonFly Design Goals

Incremental Updates

Receive updates before the flow is complete

Sustained Performance

Maintains 20Gbps+,

Single Node Architecture

High-performance without a cluster

=

Machine Learning

Analyzes data as it arrives

Bolt-On Mindset

Integrate seamlessly with other security tools

slide-5
SLIDE 5

5

5

A Practical Application of DragonFly

PCAP or it didn’t happen.

slide-6
SLIDE 6

6

Full Packet Capture is Ground Truth; but…

0% 20% 40% 60% 80% 100% Packet Capture

10Gbps Network Link 30 days ~$1.2M annually

Low Signal to Noise High Cost

Forensically relevant network data is a small fraction of total network data

No Forensic Value Forensically Relevant Data Indicators of Compromise

slide-7
SLIDE 7

7

Typical Packet Capture Workflow: Retrospective

Filter Analyze Capture Record

slide-8
SLIDE 8

8

Filter Analyze Capture Record

Intelligent Packet Capture

slide-9
SLIDE 9

9

Filter Record Capture Analyze

Intelligent Packet Capture: Real-Time

slide-10
SLIDE 10

10

Expensive – Despite its value, full packet capture is not used to its fullest extent because lengthy retention periods are cost prohibitive and retention only shrinks as bandwidth utilization increases. Ground truth – Full packet capture has long been viewed as the “ground truth” for activity on the network, allowing analysts to identify the source of security incidents. Alternatives Lack Payloads – Though valuable for portions of the security workflow, alternatives to PCAP such as Flow, and Application Metadata cannot provide the “ground truth” payload for irregular traffic. Combine forces – Intelligent packet capture combined with augmented flow provides a powerful combination that supports a data friendly log format plus the full packets for anomalous traffic.

$$

uses threat intelligence, advanced analytics, and Machine Learning to decide in near real-time what to record.

Intelligent Packet Capture

Intelligent PCAP

Using Machine Learning to Capture Packets with Forensic Value

slide-11
SLIDE 11

11

Intelligent PCAP

Performance Requirements

LOW LATENCY FEEDBACK LOOP EVENTS/S PACKETS/S

slide-12
SLIDE 12

12

tcpdump

(recording)

mlpack

(training)

Argus

(extraction)

eBPF

(filtering)

Intelligent PCAP

Open Source Framework

slide-13
SLIDE 13

13

tcpdump -i eth0 -w /cache/pcap-%m-%d-%H-%M-%S \

  • W 100 -G 300 –C 1000
slide-14
SLIDE 14

14

eBPF for Filtering

User Space Kernel

eBPF program eBPF bytecode LLVM Clang eBPF Verifier

reject load

JIT compiler eBPF native code maps event config packet data

register

slide-15
SLIDE 15

15

eBPF Map

struct bpf_map_def SEC("maps") watchlist = { .type = BPF_MAP_TYPE_PERCPU_HASH, .key_size = sizeof(u32), /* ipv4 address */ .value_size = sizeof(u64), /* counter/timeout */ .max_entries = 100000, .map_flags = BPF_F_NO_PREALLOC, }

slide-16
SLIDE 16

16

Mlpack for training

mlpack lib Training Scoring

Model

slide-17
SLIDE 17

17

mlpack splitting data

/usr/local/bin/mlpack_preprocess_split \

  • -input_file data/$filename.data.csv

\

  • -input_labels_file data/$filename.labels.csv

\

  • -training_file data/$filename.train.csv

\

  • -training_labels_file data/$filename.train.labels.csv

\

  • -test_file data/$filename.test.csv

\

  • -test_labels_file data/$filename.test.labels.csv

\

  • -test_ratio 0.3 \
  • -verbose

1

slide-18
SLIDE 18

18

mlpack generating model

/usr/local/bin/mlpack_random_forest \

  • -training_file data/$filename.data.csv

\

  • -labels_file data/$filename.labels.csv

\

  • -num_trees 10 \
  • -minimum_leaf_size 3 \
  • -print_training_accuracy

\

  • -output_model_file model/$filename.eval-model.bin \
  • -verbose

2

slide-19
SLIDE 19

19

mlpack testing model

/usr/local/bin/mlpack_random_forest \

  • -input_model_file model/$filename.eval-model.bin \
  • -test_file data/$filename.test.csv

\

  • -test_labels_file data/$filename.test.labels.csv \
  • -probabilities_file probs.csv

\

  • -verbose

3

slide-20
SLIDE 20
  • Scalable
  • Lightweight
  • Flexible
  • Extensible

Version 2.0

slide-21
SLIDE 21

21

DragonFly MLE

Analyzers Plugins

Engine

(embedded LUA JIT)

slide-22
SLIDE 22

22

Scriptable – Embedded LUA JIT Fast - C/C++

DragonFly Engine

Lightweight – Small Library Easy – Arduino Programming Model

slide-23
SLIDE 23

23

DragonFly Scriptable Analyzers

function M:setup() model = config[‘module.model’] rf = RandomForest.load(model) end function M:loop (event) …. rf:classify (event) end

slide-24
SLIDE 24

24

DragonFly Scriptable Analyzers

function M:dns (event) …. rf:classify (event) end function M:tls (event) …. rf:classify (event) end

slide-25
SLIDE 25

25

DragonFly Plug-ins

mlpack eBPF iptree Redis cuckoo filter

slide-26
SLIDE 26

26

Argus

Argus

(flow meter)

Radium

(multiplexer)

Real-time Per Flow Updates

ra

(client)

ratop

(client)

Ramle

(client)

slide-27
SLIDE 27

27

Real-Time Flow Meter

Field Overview

Flow

  • IP Addresses
  • Ports
  • Protocol
  • Total Bytes
  • Total Packets
  • Start time
  • Duration

Extended Flow

  • Flow details by direction
  • Payload
  • MAC, VLAN, MPLS, ICMP,

TCP flags and options Packet Dynamics

  • Connection Setup Times
  • Load and Rates (bytes and

packets per second)

  • Interpacket Arrival time

and Jitter

  • Dropped/retransmitted

packet statistics

  • Connection statistics (FIN,

RST, SYN, Window advertisements, Zero windows) Computed Statistics

  • Producer/Consumer Ratio
  • App/Byte Ratio
  • Key Stroke Identification
  • Flow Active Runtime

Statistics Derived Fields

  • Country Code
  • MAC Manufacturer (OUI)

Record Management

  • Record Cause (Start, Status,

Stop, Close, Error)

  • Unique Identifier (seq)
  • Sensor ID
  • Record Type (“flow” or

“management”)

Flow Features Packet Dynamic Features 100+ Features

Argus

slide-28
SLIDE 28

28

Intelligent PCAP with raml

  • Based on Argus client (library)
  • Integrated with DragonFly (library)
  • Able to run an instance per core
slide-29
SLIDE 29

29

Intelligent PCAP with raml

Argus raml

mlpack

slide-30
SLIDE 30

30

raml: DGA Analyzer

function M:loop (event) local v = features(event.domain, event.ttl) score = rf:classify (v) return score end

slide-31
SLIDE 31

31

raml: Threat Feed Analyzer

function M:setup() file = config[‘ioc.filename’] iplist = iptree(file) end function M:loop (event) local daddr = event[‘daddr’] match = iplist.lookup (daddr) return match end

slide-32
SLIDE 32

32

Intelligent PCAP Solutions

pcap0

Argus raml

eth0

tcpdump

br0

mlpack

slide-33
SLIDE 33

33

LESSONS LEARNED

Performance

<50 msec

>750Keps >14Mpps

slide-34
SLIDE 34

34

34

  • Complete POCs
  • Publish to GitHub

https://github.com/counterflow-ai/dragonfly2

  • Merge raml with Argus

https://openargus.org/

  • Explore additional use cases…

Next Steps…

slide-35
SLIDE 35

35

35

Streaming Analytics Use Cases

  • Threat Intelligence Triage
  • Encrypted Traffic Analysis
  • Predictive Fault Detection
slide-36
SLIDE 36

Questions?

RANDY CALDEJON

rc@counterflowai.com https://github.com/counterflow-ai/dragonfly2