VAST: Interactive Network Forensics Matthias Vallentin - - PowerPoint PPT Presentation

vast interactive network forensics
SMART_READER_LITE
LIVE PREVIEW

VAST: Interactive Network Forensics Matthias Vallentin - - PowerPoint PPT Presentation

VAST: Interactive Network Forensics Matthias Vallentin matthias@bro.org BroCon August 5, 2015 Demo I 2 / 26 Data Pyramid Low Filtered Fidelity Data Aggregated Data Data Volume Structured Data High Raw Data Fidelity 3 / 26 Data


slide-1
SLIDE 1

VAST: Interactive Network Forensics

Matthias Vallentin

matthias@bro.org

BroCon August 5, 2015

slide-2
SLIDE 2

Demo I

2 / 26

slide-3
SLIDE 3

Data Pyramid

Filtered Data Aggregated Data Structured Data Raw Data Data Volume High Fidelity Low Fidelity

3 / 26

slide-4
SLIDE 4

Data Pyramid

Alarms Bro Logs Bro Events Packets Data Volume High Fidelity Low Fidelity

4 / 26

slide-5
SLIDE 5

Data Pyramid

Exit Status Process Events System Calls Instruction Stream Data Volume High Fidelity Low Fidelity

5 / 26

slide-6
SLIDE 6

VAST: Visibility Across Space and Time

Archive Index Import Export

Key Features

◮ Interactive response times ◮ Horizontal scaling over a cluster ◮ Iterative query refinement ◮ Type-rich data model ◮ Strongly typed query language ◮ Historical & continuous queries

6 / 26

slide-7
SLIDE 7

High-Level Architecture of VAST

Import

◮ Sources produce events ◮ PCAP, Bro logs, BGPdump, . . .

10.0.0.1 10.0.0.254 53/udp 10.0.0.2 10.0.0.254 80/tcp

Import 7 / 26

slide-8
SLIDE 8

High-Level Architecture of VAST

Import

◮ Sources produce events ◮ PCAP, Bro logs, BGPdump, . . .

Archive

◮ Key-value store (IDs → events) ◮ Stores raw data as events

Archive

10.0.0.1 10.0.0.254 53/udp 10.0.0.2 10.0.0.254 80/tcp

Import 7 / 26

slide-9
SLIDE 9

High-Level Architecture of VAST

Import

◮ Sources produce events ◮ PCAP, Bro logs, BGPdump, . . .

Archive

◮ Key-value store (IDs → events) ◮ Stores raw data as events

Index

◮ Bitmap indexes over event data ◮ Hits are event IDs in archive

Index Archive

10.0.0.1 10.0.0.254 53/udp 10.0.0.2 10.0.0.254 80/tcp

Import 7 / 26

slide-10
SLIDE 10

High-Level Architecture of VAST

Import

◮ Sources produce events ◮ PCAP, Bro logs, BGPdump, . . .

Archive

◮ Key-value store (IDs → events) ◮ Stores raw data as events

Index

◮ Bitmap indexes over event data ◮ Hits are event IDs in archive

Export

◮ Sinks consume events ◮ PCAP, Bro logs, ASCII, JSON

Export Index Archive

10.0.0.1 10.0.0.254 53/udp 10.0.0.2 10.0.0.254 80/tcp

Import 7 / 26

slide-11
SLIDE 11

VAST & Big Data

MapReduce (Hadoop)

Batch-oriented processing: full scan of data + Expressive: no restriction on algorithms

  • Speed & Interactivity: full scan for each query

8 / 26

slide-12
SLIDE 12

VAST & Big Data

MapReduce (Hadoop)

Batch-oriented processing: full scan of data + Expressive: no restriction on algorithms

  • Speed & Interactivity: full scan for each query

In-memory Cluster Computing (Spark)

Load full data set into memory and then run query + Speed & Interactivity: fast on arbitrary queries over working set

  • Thrashing when working set too large

8 / 26

slide-13
SLIDE 13

VAST & Big Data

MapReduce (Hadoop)

Batch-oriented processing: full scan of data + Expressive: no restriction on algorithms

  • Speed & Interactivity: full scan for each query

In-memory Cluster Computing (Spark)

Load full data set into memory and then run query + Speed & Interactivity: fast on arbitrary queries over working set

  • Thrashing when working set too large

Distributed Indexing (VAST)

Distributed building and querying of bitmap indexes + Fast: only access space-efficient indexes + Caching of index hits enables iterative analyses

  • Lookup only, not arbitrary computation

8 / 26

slide-14
SLIDE 14

VAST & SIEM

Splunk

Data Model Unstructured text Index B-tree Computation MapReduce Code Closed-source License Data-volume based

9 / 26

slide-15
SLIDE 15

VAST & SIEM

Splunk

Data Model Unstructured text Index B-tree Computation MapReduce Code Closed-source License Data-volume based

ElasticSearch

Data Model Rich (Lucene) Index Inverted (Lucene) Computation Index Lookup Code Open-source License Apache 2.2

9 / 26

slide-16
SLIDE 16

VAST & SIEM

Splunk

Data Model Unstructured text Index B-tree Computation MapReduce Code Closed-source License Data-volume based

ElasticSearch

Data Model Rich (Lucene) Index Inverted (Lucene) Computation Index Lookup Code Open-source License Apache 2.2

VAST

Data Model Rich (Bro) Index Bitmap Indexes Computation Index Lookup Code Open-source License BSD (3-clause)

9 / 26

slide-17
SLIDE 17

Types: Interpretation of Data

TYPE record vector set table KEY VALUE TYPE TYPE field 1 TYPE field n TYPE

container types basic types compound types recursive types bool int count real duration time string pattern address subnet port none 10 / 26

slide-18
SLIDE 18

Query Language

Boolean Expressions

◮ Conjunctions && ◮ Disjunctions || ◮ Negations ! ◮ Predicates

◮ LHS op RHS ◮ (expr)

Examples

◮ A && B || !(C && D) ◮ orig_h == 10.0.0.1 && &time < now - 2h ◮ &type == "conn" || "foo" in :string ◮ duration > 60s && service == "tcp"

Extractors

◮ &type ◮ &time ◮ x.y.z.arg ◮ :type

Relational Operators

◮ <, <=, ==, >=, > ◮ in, ni, [+, +] ◮ !in, !ni, [-, -] ◮ ~, !~

Values

◮ T, F ◮ +42, 1337, 3.14 ◮ "foo" ◮ 10.0.0.0/8 ◮ 80/tcp, 53/? ◮ {1, 2, 3}

11 / 26

slide-19
SLIDE 19

Index Hits: Sets of Event IDs

Bitvector: ordered set of IDs

◮ Query result ≡ set of event IDs from [0, 264 − 1)

→ Model as bit vector: [4, 7, 8] = 0000100110 · · ·

◮ Run-length encoded ◮ Append-only ◮ Bitwise operations do not require decoding

Bitmap: maps values to bit vectors

◮ push_back(T x): append value x of type T ◮ lookup(T x, Op ◦): get bit vector for x under ◦

264 − 1 . 1 1 1 .

=

2 1 2 1 3 1 1 1 1 1 1 1 B1 B2 B3 B0 Data Bitmap

12 / 26

slide-20
SLIDE 20

Composing Results via Bitwise Operations

Combining Predicates

◮ Query Q = X ∧ Y ∧ Z

◮ x = 1.2.3.4 ∧ y < 42 ∧ z ∈ ”foo”

◮ Bitmap index lookup yields X → B1, Y → B2, and Z → B3 ◮ Result R = B1 & B2 & B3

& & B1 B3 = B2 R

13 / 26

slide-21
SLIDE 21

What happened since BroCon’14?

New Features

◮ Continuous queries

◮ Apply queries to arriving data 14 / 26

slide-22
SLIDE 22

What happened since BroCon’14?

New Features

◮ Continuous queries

◮ Apply queries to arriving data

◮ Time Machine

◮ Full indexes on time stamp and connection tuple ◮ Bidirectional flow cut-off 14 / 26

slide-23
SLIDE 23

What happened since BroCon’14?

New Features

◮ Continuous queries

◮ Apply queries to arriving data

◮ Time Machine

◮ Full indexes on time stamp and connection tuple ◮ Bidirectional flow cut-off

◮ New event sources

◮ BGPdump ◮ JSON/Kafka (not yet merged) 14 / 26

slide-24
SLIDE 24

What happened since BroCon’14?

New Features

◮ Continuous queries

◮ Apply queries to arriving data

◮ Time Machine

◮ Full indexes on time stamp and connection tuple ◮ Bidirectional flow cut-off

◮ New event sources

◮ BGPdump ◮ JSON/Kafka (not yet merged)

◮ Distributed Architecture

◮ Commutativity: support message reordering ◮ Associativity: parallel query engine 14 / 26

slide-25
SLIDE 25

What happened since BroCon’14?

New Features

◮ Continuous queries

◮ Apply queries to arriving data

◮ Time Machine

◮ Full indexes on time stamp and connection tuple ◮ Bidirectional flow cut-off

◮ New event sources

◮ BGPdump ◮ JSON/Kafka (not yet merged)

◮ Distributed Architecture

◮ Commutativity: support message reordering ◮ Associativity: parallel query engine 14 / 26

slide-26
SLIDE 26

Distributed VAST

I A X E importer archive index exporter node

node: the logical unit of deployment

◮ A container for actors/components ◮ Message serialization only at node boundaries

→ Maps to single OS process, typically one per machine

15 / 26

slide-27
SLIDE 27

Distributed VAST: Replicated Cores

I A X E I A X E I A X E

16 / 26

slide-28
SLIDE 28

Distributed VAST: Replicated Cores

source I A X sink E I A X E I A X E

17 / 26

slide-29
SLIDE 29

Distributed VAST: Custom Deployment

source I A X sink E I X I A X E

HDD SSD SSD

source

18 / 26

slide-30
SLIDE 30

Demo II

19 / 26

slide-31
SLIDE 31

Demo Topology: Import

I A X I A X foo bar ID

20 / 26

slide-32
SLIDE 32

Demo Topology: Import

I A X I A X foo bar ID source

21 / 26

slide-33
SLIDE 33

Demo Topology: Export (naive)

A X E A X foo bar sink

22 / 26

slide-34
SLIDE 34

Demo Topology: Export (better)

A X E A X E foo bar

23 / 26

slide-35
SLIDE 35

Demo Topology: Export (good)

A X E A X E foo bar sink

24 / 26

slide-36
SLIDE 36

Future Work: Moving Forward

Next Milestone: Release

◮ Architecture converging: feature freeze for 0.1 soon ◮ Thorough testing of distributed architecture ◮ Improve index size of strings and containers

25 / 26

slide-37
SLIDE 37

Future Work: Moving Forward

Next Milestone: Release

◮ Architecture converging: feature freeze for 0.1 soon ◮ Thorough testing of distributed architecture ◮ Improve index size of strings and containers

Down The Line

◮ Improved Bro integration

◮ Unify data model with Broker ◮ VAST writer for Bro 25 / 26

slide-38
SLIDE 38

Future Work: Moving Forward

Next Milestone: Release

◮ Architecture converging: feature freeze for 0.1 soon ◮ Thorough testing of distributed architecture ◮ Improve index size of strings and containers

Down The Line

◮ Improved Bro integration

◮ Unify data model with Broker ◮ VAST writer for Bro

◮ Fault tolerance

◮ Data replication (replicate archive & index) ◮ Query snapshotting (resume failed execution) ◮ Use Raft to manage global state (large-scale clusters) 25 / 26

slide-39
SLIDE 39

Future Work: Moving Forward

Next Milestone: Release

◮ Architecture converging: feature freeze for 0.1 soon ◮ Thorough testing of distributed architecture ◮ Improve index size of strings and containers

Down The Line

◮ Improved Bro integration

◮ Unify data model with Broker ◮ VAST writer for Bro

◮ Fault tolerance

◮ Data replication (replicate archive & index) ◮ Query snapshotting (resume failed execution) ◮ Use Raft to manage global state (large-scale clusters)

◮ Interface with Spark to enable arbitrary computation

25 / 26

slide-40
SLIDE 40

Future Work: Moving Forward

Next Milestone: Release

◮ Architecture converging: feature freeze for 0.1 soon ◮ Thorough testing of distributed architecture ◮ Improve index size of strings and containers

Down The Line

◮ Improved Bro integration

◮ Unify data model with Broker ◮ VAST writer for Bro

◮ Fault tolerance

◮ Data replication (replicate archive & index) ◮ Query snapshotting (resume failed execution) ◮ Use Raft to manage global state (large-scale clusters)

◮ Interface with Spark to enable arbitrary computation ◮ Interface with Spicy for powerful event import/export

25 / 26

slide-41
SLIDE 41

Questions?

More at:

http://vast.tools

26 / 26