Spicy: A Unified Deep Packet Inspection Framework Dissecting All - - PowerPoint PPT Presentation

spicy a unified deep packet inspection framework
SMART_READER_LITE
LIVE PREVIEW

Spicy: A Unified Deep Packet Inspection Framework Dissecting All - - PowerPoint PPT Presentation

Spicy: A Unified Deep Packet Inspection Framework Dissecting All Your Data Robin Sommer International Computer Science Institute, & Corelight, Inc. robin@icsi.berkeley.edu robin@corelight.io http://www.icir.org/robin Deep Packet


slide-1
SLIDE 1

Robin Sommer

International Computer Science Institute, & Corelight, Inc.

robin@icsi.berkeley.edu robin@corelight.io http://www.icir.org/robin

Spicy: A Unified Deep Packet Inspection Framework Dissecting All Your Data

slide-2
SLIDE 2

Deep Packet Inspection

2

Tap

IDS

Internet

Local Network

slide-3
SLIDE 3

Deep Packet Inspection

2

Tap

IDS

  • 1. Find and parse all Web traffic.
  • 2. Find and extract binaries.
  • 3. Compute hash and compare with database.
  • 4. Report, and potentially kill, if found.

Example: Finding downloads of known malware.

Internet

Local Network

slide-4
SLIDE 4

Deep Packet Inspection

2

Tap

IDS

  • 1. Find and parse all Web traffic.
  • 2. Find and extract binaries.
  • 3. Compute hash and compare with database.
  • 4. Report, and potentially kill, if found.

Example: Finding downloads of known malware.

Internet

Local Network

slide-5
SLIDE 5

Protocol Parsing

3

Request for /x/y/foo.zip Status OK plus data

5.6.7.8/80 1.2.3.4/4321

Web Server Web Client

slide-6
SLIDE 6

Protocol Parsing

3

Request for /x/y/foo.zip Status OK plus data

5.6.7.8/80 1.2.3.4/4321

Web Server Web Client

ACK FIN FIN

TCP connection established

...

SYN SYN ACK ACK ACK ...

slide-7
SLIDE 7

Protocol Parsing

3

Request for /x/y/foo.zip Status OK plus data

5.6.7.8/80 1.2.3.4/4321

TCP stream reassembly for originator

Request for /x/y/foo.zip, protocol version 1.1, HTTP headers

GET /x/y/foo.zip HTTP/1.1 …

Web Server Web Client

ACK FIN FIN

TCP connection established

...

SYN SYN ACK ACK ACK ...

slide-8
SLIDE 8

Protocol Parsing

3

Request for /x/y/foo.zip Status OK plus data

5.6.7.8/80 1.2.3.4/4321

TCP stream reassembly for originator

Request for /x/y/foo.zip, protocol version 1.1, HTTP headers

GET /x/y/foo.zip HTTP/1.1 …

TCP stream reassembly for responder

Reply with page content for further analysis (e.g., hash; unpack & parse files)

200 OK …

Web Server Web Client

ACK FIN FIN

TCP connection established

...

SYN SYN ACK ACK ACK ...

slide-9
SLIDE 9

Protocol Parsing

3

Request for /x/y/foo.zip Status OK plus data

5.6.7.8/80 1.2.3.4/4321

TCP connection tear down

TCP stream reassembly for originator

Request for /x/y/foo.zip, protocol version 1.1, HTTP headers

GET /x/y/foo.zip HTTP/1.1 …

TCP stream reassembly for responder

Reply with page content for further analysis (e.g., hash; unpack & parse files)

200 OK …

Web Server Web Client

ACK FIN FIN

TCP connection established

...

SYN SYN ACK ACK ACK ...

slide-10
SLIDE 10

Parsing Is Hard

4

...

SYN SYN ACK ACK ACK ACK FIN FIN

...

slide-11
SLIDE 11

Parsing Is Hard

Must be robust

Lots of “crud” in real-world networks Cannot trust input

4

...

SYN SYN ACK ACK ACK ACK FIN FIN

...

slide-12
SLIDE 12

Parsing Is Hard

Must be robust

Lots of “crud” in real-world networks Cannot trust input

Must be efficient

100,000s of concurrent connections Incremental processing for low latency & memory usage

4

...

SYN SYN ACK ACK ACK ACK FIN FIN

...

slide-13
SLIDE 13

Parsing Is Hard

Must be robust

Lots of “crud” in real-world networks Cannot trust input

Must be efficient

100,000s of concurrent connections Incremental processing for low latency & memory usage

Must be complete

Leaving out parts of the protocol opens evasion opportunities Protocols can be really complex (SMB …)

4

...

SYN SYN ACK ACK ACK ACK FIN FIN

...

slide-14
SLIDE 14

There are a lot of protocols out there …

5

Even a simple case involves 5 protocols

PCAP IP TCP HTTP Ethernet

slide-15
SLIDE 15

There are a lot of protocols out there …

5

Even a simple case involves 5 protocols

PCAP IP TCP HTTP

A few popular protocols account for the bulk of traffic in most environments 


(e.g., TCP/IP , HTTP , TLS, DNS, SMTP , IMAP)

Ethernet

slide-16
SLIDE 16

There are a lot of protocols out there …

5

Even a simple case involves 5 protocols

PCAP IP TCP HTTP

A few popular protocols account for the bulk of traffic in most environments 


(e.g., TCP/IP , HTTP , TLS, DNS, SMTP , IMAP)

Long tail of further protocols, often environment-specific


(e.g., SMB, Modbus, BACnet, more L2)

Ethernet

slide-17
SLIDE 17

There are a lot of protocols out there …

5

Even a simple case involves 5 protocols

PCAP IP TCP HTTP

A few popular protocols account for the bulk of traffic in most environments 


(e.g., TCP/IP , HTTP , TLS, DNS, SMTP , IMAP)

Long tail of further protocols, often environment-specific


(e.g., SMB, Modbus, BACnet, more L2)

Ethernet

File formats amplify the challenge

slide-18
SLIDE 18

Example: Bro 2.5

6

AYIYA BitTorrent DCE_RPC DHCP DNP3 DNS DTLS FTP Finger GTPv1 Gnutella HTTP ICMP IPv4/6 IRC Ident Kerberos Login Modbus MySQL NCP NFS NTP NetBIOS PE POP3 Portmapper Radius RDP Rlogin Rsh SMB SIP SMTP SNMP SOCKS SSH SSL Syslog TCP Telnet Teredo UDP X509 ZIP

slide-19
SLIDE 19

A Tale of Three Open-Source IDS

7

Suricata

slide-20
SLIDE 20

A Tale of Three Open-Source IDS

7

Suricata

Shared parsers?

Every DPI application rewrites its parsers — usually in C/C++!

None.

slide-21
SLIDE 21

Opportunity: Provide Platform for Parsers

8

slide-22
SLIDE 22

Opportunity: Provide Platform for Parsers

Protocols leverage a rather small set of patterns

Readable line-based formats for text protocols Static “prototocol data units” (PDU) for binary protocols Request/response structure Common sub-formats (HTTP/MIME/ASN.1) Fragmentation (even at app layer!)

8

slide-23
SLIDE 23

Opportunity: Provide Platform for Parsers

Protocols leverage a rather small set of patterns

Readable line-based formats for text protocols Static “prototocol data units” (PDU) for binary protocols Request/response structure Common sub-formats (HTTP/MIME/ASN.1) Fragmentation (even at app layer!)

But: Potpourri of protocols remains diverse still

Every protocol does something different

8

slide-24
SLIDE 24

Opportunity: Provide Platform for Parsers

Protocols leverage a rather small set of patterns

Readable line-based formats for text protocols Static “prototocol data units” (PDU) for binary protocols Request/response structure Common sub-formats (HTTP/MIME/ASN.1) Fragmentation (even at app layer!)

But: Potpourri of protocols remains diverse still

Every protocol does something different

Can we leverage similarities, while remaining flexible?

8

slide-25
SLIDE 25

Opportunity: Provide Platform for Parsers

Protocols leverage a rather small set of patterns

Readable line-based formats for text protocols Static “prototocol data units” (PDU) for binary protocols Request/response structure Common sub-formats (HTTP/MIME/ASN.1) Fragmentation (even at app layer!)

But: Potpourri of protocols remains diverse still

Every protocol does something different

Can we leverage similarities, while remaining flexible? Can we reuse code across applications?

8

slide-26
SLIDE 26

Meanwhile, in another domain …

9

There are powerful tools for implementing parsers for programming languages.

slide-27
SLIDE 27

Meanwhile, in another domain …

9

There are powerful tools for implementing parsers for programming languages.

exp: NUM { $$ = $1; } | exp '+' exp { $$ = $1 + $2; } | exp ‘-' exp { $$ = $1 - $2; } | exp ‘*' exp { $$ = $1 * $2; } | exp ‘/' exp { $$ = $1 / $2; }

slide-28
SLIDE 28

Meanwhile, in another domain …

9

There are powerful tools for implementing parsers for programming languages.

exp: NUM { $$ = $1; } | exp '+' exp { $$ = $1 + $2; } | exp ‘-' exp { $$ = $1 - $2; } | exp ‘*' exp { $$ = $1 * $2; } | exp ‘/' exp { $$ = $1 / $2; } yyparse() Host Application

Yacc

slide-29
SLIDE 29

Meanwhile, in another domain …

9

There are powerful tools for implementing parsers for programming languages.

exp: NUM { $$ = $1; } | exp '+' exp { $$ = $1 + $2; } | exp ‘-' exp { $$ = $1 - $2; } | exp ‘*' exp { $$ = $1 * $2; } | exp ‘/' exp { $$ = $1 / $2; } yyparse() Host Application

Yacc

These parsers aren’t suitable for DPI, unfortunately. No support for concurrent, incremental processing No support for domain-specific idioms

slide-30
SLIDE 30

Domain-specific Parser Generation

10

IMC 2006

slide-31
SLIDE 31

Domain-specific Parser Generation

10

IMC 2006

type ClientHello(rec: HandshakeRecord) = record { client_version: uint16; gmt_unix_time : uint32; random_bytes : bytestring &length = 28; session_len : uint8; session_id : uint8[session_len]; dtls_cookie : case client_version of { DTLSv10, DTLSv12 -> cookie : ClientHelloCookie(rec); default -> nothing: bytestring &length=0; }; […] }

TLS v3 Client Hello (Source: Bro’s TLS analyzer)

slide-32
SLIDE 32

Domain-specific Parser Generation

10

class binpac:: ConnectionAnalyzer

Host Application

BinPAC

IMC 2006

type ClientHello(rec: HandshakeRecord) = record { client_version: uint16; gmt_unix_time : uint32; random_bytes : bytestring &length = 28; session_len : uint8; session_id : uint8[session_len]; dtls_cookie : case client_version of { DTLSv10, DTLSv12 -> cookie : ClientHelloCookie(rec); default -> nothing: bytestring &length=0; }; […] }

TLS v3 Client Hello (Source: Bro’s TLS analyzer)

slide-33
SLIDE 33

Domain-specific Parser Generation

10

class binpac:: ConnectionAnalyzer

Host Application

BinPAC

IMC 2006

type ClientHello(rec: HandshakeRecord) = record { client_version: uint16; gmt_unix_time : uint32; random_bytes : bytestring &length = 28; session_len : uint8; session_id : uint8[session_len]; dtls_cookie : case client_version of { DTLSv10, DTLSv12 -> cookie : ClientHelloCookie(rec); default -> nothing: bytestring &length=0; }; […] }

TLS v3 Client Hello (Source: Bro’s TLS analyzer)

BinPAC works, but solves the problem only partially.

Remains limited to syntax, cannot express logic. Still needs custom C++ for logic & integration. Remains limited to app protocols & connection structure. Lacks support for higher-level idioms.

slide-34
SLIDE 34

New Framework: Spicy

11

Integrates experience from many years of writing parsers manually and with BinPAC.

slide-35
SLIDE 35

New Framework: Spicy

Expresses both syntax and logic Supports protocols and file formats Facilitates composition and reuse Supports error handling and recovery Just-in-time compilation via LLVM

11

Integrates experience from many years of writing parsers manually and with BinPAC.

slide-36
SLIDE 36

Spicy Example: Parsing SMTP Banners

12

220 mx.foo.com ESMTP Postfix

slide-37
SLIDE 37

Spicy Example: Parsing SMTP Banners

12

220 mx.foo.com ESMTP Postfix

module SMTP; export type Greeting = unit { : /220 +/; domain : /[^ ]+/; : / */; protocol: /(E?SMTP)?/; : / */; software: /[^ ]*/; 


  • n %done { print self; }

}

smtp.spicy

slide-38
SLIDE 38

Spicy Example: Parsing SMTP Banners

12

220 mx.foo.com ESMTP Postfix

# echo "220 mx.foo.com ESMTP Postfix” | spicy-driver smtp.spicy <domain=mx.foo.com, protocol=ESMTP, software=Postfix> module SMTP; export type Greeting = unit { : /220 +/; domain : /[^ ]+/; : / */; protocol: /(E?SMTP)?/; : / */; software: /[^ ]*/; 


  • n %done { print self; }

}

smtp.spicy

slide-39
SLIDE 39

Host Application API

13

# Compile Spicy code just-in-time (C++)

auto ctx = new spicy::CompilerContext(); atuo llvm_module = ctx->compile(“smtp.spicy”); auto linked_module = ctx->linkModules("SMTP", llvm_module); auto jit = ctx->jit(linked_module); auto parse_func = jit->nativeFunction(“smtp_greeting_parse”) auto resume_func = jit->nativeFunction(“smtp_greeting_resume”)

slide-40
SLIDE 40

Host Application API

13

# Compile Spicy code just-in-time (C++)

auto ctx = new spicy::CompilerContext(); atuo llvm_module = ctx->compile(“smtp.spicy”); auto linked_module = ctx->linkModules("SMTP", llvm_module); auto jit = ctx->jit(linked_module); auto parse_func = jit->nativeFunction(“smtp_greeting_parse”) auto resume_func = jit->nativeFunction(“smtp_greeting_resume”)

# Feed data into parser (C).

hlt_bytes* data = hlt_bytes_new_from_data(“220 mx.foo.”); void* cookie = (*parse_func)(data); hlt_bytes* next = hlt_bytes_new_from_data(“.com ESMTP Postfix”); hlt_bytes_append(data, next); cookie = (*resume_func)(cookie);

slide-41
SLIDE 41

A File Format: Tar

14

slide-42
SLIDE 42

A File Format: Tar

14

module tar; export type Archive = unit { files: list<File>; : uint<8>(0x0); : bytes &length=511; }; type File = unit { header: Header; data : bytes &length=self.header.size; : bytes &length=512-(self.header.size mod 512) }; type Type = enum { REG=0, LNK=1, SYM=2, CHR=3, BLK=4, DIR=5, FIFO=6 }; type Header = unit { name : bytes &length=100; mode : bytes &length=8; uid : bytes &length=8; gid : bytes &length=8; size : bytes &length=12 &convert=$$.to_uint(8); mtime : bytes &length=12 &convert=$$.to_time(8); chksum: bytes &length=8 &convert=$$.to_uint(8); tflag : bytes &length=1 &convert=$$.to_uint(8); lname : bytes &length=100; : bytes &length=88; # Skip further fields prefix: bytes &length=155; : bytes &length=12;. var full_path: bytes;

  • n %done {

if ( ! self.tflag ) self.tflag = Type::REG; self.full_path = self.prefix + b"/"+ self.name; }

slide-43
SLIDE 43

A File Format: Tar

14

module tar; export type Archive = unit { files: list<File>; : uint<8>(0x0); : bytes &length=511; }; type File = unit { header: Header; data : bytes &length=self.header.size; : bytes &length=512-(self.header.size mod 512) }; type Type = enum { REG=0, LNK=1, SYM=2, CHR=3, BLK=4, DIR=5, FIFO=6 }; type Header = unit { name : bytes &length=100; mode : bytes &length=8; uid : bytes &length=8; gid : bytes &length=8; size : bytes &length=12 &convert=$$.to_uint(8); mtime : bytes &length=12 &convert=$$.to_time(8); chksum: bytes &length=8 &convert=$$.to_uint(8); tflag : bytes &length=1 &convert=$$.to_uint(8); lname : bytes &length=100; : bytes &length=88; # Skip further fields prefix: bytes &length=155; : bytes &length=12;. var full_path: bytes;

  • n %done {

if ( ! self.tflag ) self.tflag = Type::REG; self.full_path = self.prefix + b"/"+ self.name; } # tar tvf mp.tar foobar/staff 0 2016-05-15 18:58 mp/ foobar/staff 39548 2016-05-15 18:58 mp/part01.txt foobar/staff 39503 2016-05-15 18:58 mp/part02.txt*/ # cat print-tar.spicy module PrintTar; import tar;

  • n tar::Archive::%done {

print self.files; } # cat mp.tar | spicy-driver tar.spicy print-tar.spicy [<header=<name=b"mp/", mode=b"000755", uid=b"000771", gid=b"000024", size=0, mtime=2016-05-16T02:58:19Z, chksum=5100, tflag=DIR>, data=b””, […], full_path=b”mp/“>] [<header=<name=b"mp/part01.txt", mode=b"000644", uid=b"000771", gid=b"000024", size=39548, mtime=2016-05-16T02:58:19Z, chksum=6351, tflag=REG>, data=b"A seashore. Some way out to sea […]”, […], full_path=b”mp/part01.txt“>] [<header=<name=b”mp/part02.txt", mode=b"000644", uid=b"000771", gid=b"000024", size=39503, mtime=2016-05-16T02:58:11Z, chksum=6348, tflag=REG>, data=b"A man appears on the top of a sand […]”, […] full_path=b”mp/part02.txt“>]

slide-44
SLIDE 44

Composition: Pipelining Layers

15

slide-45
SLIDE 45

Composition: Pipelining Layers

15

type HTTP::Body = unit(msg: Message, delivery_mode: DeliveryMode) { var data: sink;

  • n %init {

# Add parser for body content (e.g., application/x-tar) self.data.connect_mime_type(msg.content_type); if ( msg.content_encoding == b"gzip" ) { self.data.add_filter(Spicy::Filter::GZIP); } switch ( delivery_mode ) { DeliveryMode::EndOfData -> : bytes &eod -> self.data; DeliveryMode::Length -> : bytes &length=msg.content_length -> self.data; DeliveryMode::Multipart -> : list<[^\r\n]*\r?\n/> &until($$ == msg.boundary) foreach { self.data.write($$); } };

slide-46
SLIDE 46

Error Recovery

16

slide-47
SLIDE 47

Error Recovery

16

type HTTP::Requests = unit { requests: list<Request> &synchronize; }; type HTTP::Request = unit { request: RequestLine; message: Message; }; type HTTP::RequestLine = unit { %synchronize-at = /^(GET|POST|HEAD) /; method: Token; : WhiteSpace; uri: Token; : WhiteSpace; : /HTTP\//; version: /[0-9]+\.[0-9]*/; : NewLine; };

type HTTP::Message = unit { ... }

slide-48
SLIDE 48

Evaluation: Writing Spicy Parsers

17

slide-49
SLIDE 49

Evaluation: Writing Spicy Parsers

17

slide-50
SLIDE 50

Evaluation: Writing Spicy Parsers

17

PCAP Ethernet HTTP PCAP MS-Cert BACnet IPv4 UDP TCP DNS TLS X.509

slide-51
SLIDE 51

Evaluation: Writing Spicy Parsers

17

PCAP Ethernet HTTP PCAP MS-Cert BACnet IPv4 UDP TCP DNS TLS X.509

Trace 1

X.509 MS Cert Store HTTP TCP IP Ethernet PCAP

Trace 2

X.509 TLS TCP IP Ethernet PCAP HTTP TCP IP Ethernet PCAP

slide-52
SLIDE 52

Evaluation: Real-world Performance

18

DNS: Full Berkeley port 53 traffic. 1GB trace, 10min, 65M messages. HTTP: 1/25 of Berkeley port 80 traffic. 30GB trace, 52min, 340k messages.

Add Spicy plugin for Bro to compare parsing with a native Bro. Traces:

slide-53
SLIDE 53

Evaluation: Real-world Performance

18

DNS: Full Berkeley port 53 traffic. 1GB trace, 10min, 65M messages. HTTP: 1/25 of Berkeley port 80 traffic. 30GB trace, 52min, 340k messages.

Correctness

Spicy captures protocols correctly.

Add Spicy plugin for Bro to compare parsing with a native Bro. Traces:

slide-54
SLIDE 54

Evaluation: Real-world Performance

18

DNS: Full Berkeley port 53 traffic. 1GB trace, 10min, 65M messages. HTTP: 1/25 of Berkeley port 80 traffic. 30GB trace, 52min, 340k messages.

Correctness

Spicy captures protocols correctly.

Add Spicy plugin for Bro to compare parsing with a native Bro.

Let’s see.

Performance

Traces:

slide-55
SLIDE 55

Performance: Spicy vs. C++ in Bro

19

0.0B 0.2B 0.4B 0.6B 0.8B 1.0B 1.2B 1.4B 1.6B 1.8B C38 cycOHs 6tanGaUG HIL7I 6tanGaUG HIL7I

1567G 683G 643G 241G 1580G 852G 450G 21G 258G 712G 177G 356G 180G 1173G 469G 405G 81G 217G

HTTP DNS

Protocol PDrsing 6cULSt ExHcutLon HIL7I-to-BUo GOuH 2thHU

Spicy

Spicy Spicy

Spicy

slide-56
SLIDE 56

Performance: Spicy vs. C++ in Bro

19

0.0B 0.2B 0.4B 0.6B 0.8B 1.0B 1.2B 1.4B 1.6B 1.8B C38 cycOHs 6tanGaUG HIL7I 6tanGaUG HIL7I

1567G 683G 643G 241G 1580G 852G 450G 21G 258G 712G 177G 356G 180G 1173G 469G 405G 81G 217G

HTTP DNS

Protocol PDrsing 6cULSt ExHcutLon HIL7I-to-BUo GOuH 2thHU

Spicy

Spicy

1.25x

Spicy

2.65x

Spicy

slide-57
SLIDE 57

Bro Integration: “3rd Generation Parsers”

20

slide-58
SLIDE 58

Bro Integration: “3rd Generation Parsers”

Generation 1: Manually written C++ code.

20

slide-59
SLIDE 59

Bro Integration: “3rd Generation Parsers”

Generation 1: Manually written C++ code. Generation 2: BinPAC - “yacc for protocols”.

20

slide-60
SLIDE 60

Bro Integration: “3rd Generation Parsers”

Generation 1: Manually written C++ code. Generation 2: BinPAC - “yacc for protocols”. Generation 3: Spicy - A “closed” system.

20 Spicy Grammar *.spicy Event Definitions *.evt Bro Bro Scripts *.bro

JIT

slide-61
SLIDE 61

Advanced Spicy Features

Composibility Error detection & recovery Protocol detection Reassembly/defragmentation Generating wire format

21

slide-62
SLIDE 62

Implementation: HILTI Toolchain

22

IMC 2014

slide-63
SLIDE 63

Implementation: HILTI Toolchain

22

Spicy

Runtime Library

Spicy Compiler Spicy Grammar LLVM Bitcode Compiler/ Linker HILTI Compiler

HILTI Environment LLVM Toolchain

Runtime Library

C Interface Stubs

Machine Code

HILTI Machine Code

IMC 2014

slide-64
SLIDE 64

The HILTI Model

23

Performance via Abstraction

Transparent improvement under the hood Integration of non-standard hardware High-level, global compiler optimizations Automatic parallelization Means and glue to share functionality HILTI library of common high-level components

Facilitating Reuse Secure Execution Environment

Sandboxed execution Automatic memory management

slide-65
SLIDE 65

Summary

24

Spicy is a next-generation parser generator for deep packet inspection systems.

slide-66
SLIDE 66

Summary

24

Expresses both syntax and semantics Supports protocols and file formats Facilitates composition and reuse Supports error handling and recovery Just-in-time compilation via LLVM

Spicy is a next-generation parser generator for deep packet inspection systems.

slide-67
SLIDE 67

Summary

24

Expresses both syntax and semantics Supports protocols and file formats Facilitates composition and reuse Supports error handling and recovery Just-in-time compilation via LLVM

Spicy is a next-generation parser generator for deep packet inspection systems.

http://www.icir.org/hilti

Open-source, BSD-licensed prototype.

slide-68
SLIDE 68

25

The Bro Project www.bro.org info@bro.org @Bro_IDS Professional Bro Solutions www.corelight.io info@corelight.io @corelight_inc

C

  • r

e l i g h t i s h i r i n g !

Robin Sommer

International Computer Science Institute, & Corelight, Inc.

robin@icsi.berkeley.edu robin@corelight.io http://www.icir.org/robin

Questions?