Detecting Unknown Network Attacks using Language Models Konrad - PowerPoint PPT Presentation

Detecting Unknown Network Attacks using Language Models Konrad Rieck and Pavel Laskov DIMVA 2006, July 13/14 Berlin, Germany

The zero-day problem ‣ How to distinguish normal from unknown ? GET /dimva06/john/martin.html Accept: */* Accept-Language: en Host: www Connection: keep-alive GET /scripts/..%%35c../..%%35c../..%%35c../..%%35c %%35c../winnt/system32/cmd.exe?/c+dir+c:\ HTTP/1.0 Host: www Connection: close ‣ Cast intrusion detection into linguistic problem ‣ Utilization of machine learning instruments

N-gram models Connection payload get ▯ /index.html g e ge et get et ▯ ⋯ t / t ▯ ▯ / /i t ▯ / ▯ /i /in ▯ n-grams i n in nd ind nde ⋯ ⋯ ⋯ Bytes 2-grams 3-grams

N-grams in attacks GET /scripts/..%%35c../..%%35c../..%%35c../..%%35c %%35c../winnt/system32/cmd.exe?/c+dir+c:\ HTTP/1.0 Frequency differences to 4-grams in normal HTTP Nimda IIS attack and HTTP traffic comparison %%35 35c. 5c.. c../ 0.05 0.04 frequency difference 0.03 0.02 0.01 0 ! 0.01 Acce cept ! 0.02 4 ! grams

Geometric representation ‣ A simple example 0.015 0.015 GET ▯ HTTP pipelining 0.01 0.01 Acce Acce %%35 0.005 0.005 Similarity of connections 0 0 ‣ Huge feature space 0 0 0 0 0.005 0.005 %%35 0.01 0.01 ‣ 256 n dimensions 0.05 0.05 GET ▯ ‣ Geometric representation of connections

Similarity measures ‣ Distances, kernel functions, ... e.g. w ∈ L | φ w ( x ) − φ w ( y ) | ‣ Manhattan � k �� w ∈ L | φ w ( x ) − φ w ( y ) | k ‣ Minkowski x , y ∈ { 0 , . . . , 255 } ∗ , L = { 0 , . . . , 255 } n frequency of w in sequence x φ w ( x ) = ‣ Efficient computation not trivial ‣ Sparse representation of n-gram frequencies ‣ Linear-time algorithms (cf. DIMVA 2006 paper)

Anomaly detection ‣ Detection of outliers in feature space ‣ Exploration of geometry between connections ‣ No training phase - no labels required ‣ Anomaly detection (AD) methods ‣ e.g. Spherical AD, Cluster AD, Neighborhood AD Spherical anomaly detection Toy data Cluster anomaly detection Toy data Neighborhood anomaly detection Toy data 1 1 1 1 1 1 0.8 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.2 0 0 0 0 0 0 0 0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

Experiments ‣ Open questions ‣ Do n-gram models capture semantics sufficient for detection of unknown attacks? ‣ Can anomaly detection reliably operate at low false-positive rates? ‣ How does this approach compare to classical signature-based intrusion detection?

Evaluation data ‣ PESIM 2005 data set ‣ Real network traffic to servers at our laboratory ‣ HTTP Reverse proxies of web sites ‣ FTP Local file sharing, e.g. photos, media ‣ SMTP Retransmission flavored with spam ‣ Attacks injected by pentest expert (e.g. metasploit) ‣ DARPA 1999 data set as reference ‣ Statistical preprocessing ‣ Extraction of 30 independent sample s comprising 1000 incoming connection payloads per protocol

Method comparison ‣ Comparison of anomaly detection methods ‣ Criteria: AUC 0.01 - Area under ROC within [0, 0.01] ‣ Results averaged over n-gram lengths [1,7] Protocol Best method AUC 0.01 Spherical (qsSVM) HTTP 0.781 Neighborhood (Zeta) FTP 0.746 Cluster (Single-linkage) SMTP 0.756 Bottom line: Different protocols require different anomaly detection methods

N-gram lengths ‣ How does one choose the optimal n-gram length? HTTP FTP SMTP 40% 30% 20% 10% 0% 1 2 3 4 5 6 7 Optimal n-gram length per attack ‣ No single n fits all: variable-length models required

Variable-length models Connection payload get ▯ /index.html n = CR LF TAB ▯ {1,2,3,...} , . : / & g e get get et ▯ t / ▯ t ▯ / ▯ /i /in index i n ge et ind nde ⋯ t ▯ ▯ / /i html ⋯ in nd ⋯ Combined n-grams Words

Comparison with Snort ‣ Language models vs. Snort ‣ Combined n-gram (1-7) and word models ‣ Snort: Version 2.4.2 with default rules HTTP traffic FTP traffic SMTP traffic 1 1 1 0.9 0.9 0.9 0.8 0.8 0.8 0.7 0.7 0.7 true positive rate true positive rate true positive rate 0.6 0.6 0.6 0.5 0.5 0.5 0.4 0.4 0.4 0.3 0.3 0.3 0.2 0.2 0.2 Best combined Best combined Best combined Words Words Words 0.1 0.1 0.1 Snort Snort Snort 0 0 0 0 0.002 0.004 0.006 0.008 0.01 0 0.002 0.004 0.006 0.008 0.01 0 0.002 0.004 0.006 0.008 0.01 false positive rate false positive rate false positive rate

Conclusions and outlook ‣ Language models for intrusion detection ‣ Characteristic patterns in normal traffic and attacks ‣ Unsupervised nomaly detection with high accuracy ‣ Detection of ~80% unknown network attacks ‣ Future perspective ‣ From in vitro to in vivo: real-time application ‣ Language models as prototypes for signatures?

Outwit language models ‣ Approaches ‣ Red herring Denial-of-service with random traffic patterns ‣ Creeping poisoning Careful subversion of normal traffic model ‣ Mimicry attacks Adaption of attacks to mimicry normal traffic ‣ Conclusions ‣ (1) Worse for signature-based intrusion detection ‣ (2,3) Requires profound insider knowlegde

Questions?

Detecting Unknown Network Attacks using Language Models Konrad - PowerPoint PPT Presentation

Detecting Unknown Network Attacks using Language Models Konrad Rieck and Pavel Laskov DIMVA 2006, July 13/14 Berlin, Germany The zero-day problem How to distinguish normal from unknown ? GET /dimva06/john/martin.html Accept: /

Detecting Spammers and Content Detecting Spammers and Content Detecting Spammers and Content

12/6/2013 Detecting Fakes Image Forensics: Detecting Forged Photos 1.Detecting photorealistic

NetFlow Analysis: Detecting covert channels on the network Detecting malicious traffic by using

Detecting Sybil Attacks using Proofs of Work and Location for Vehicular AdHoc Networks (VANETS)

Wireless Security Wireless Network Attacks Access control attacks These attacks attempt to

Models of Language Evolution models thereof its evolution language Models of Language Evolution

ICANN 50 Detecting Distributed DNS Attacks Utilizing Levenshtein String Distances

Learning Nonstationary Models of Normal Network Traffic for Detecting Novel Attacks Matthew V.

Introduction Detecting Errors in Effects of Annotation Errors Detecting Errors in Corpus

VISION Why lemurs? Activity Intact opsins pattern ( * ) (unknown about others) ( *

4 Language Models 2: Log-linear Language Models This chapter will discuss another set of language

Generic Attacks on Stream Ciphers John Mattsson Generic Attacks on Stream Ciphers 2/22

Detecting Power Attacks on Reconfigurable Hardware Adrien Le Masle Wayne Luk Department of

Detecting network outages using different sources of data TMA Experts Summit, Paris, France

Preventing and detecting network attacks Harald Vranken 1 About me Open University &

HoneySpider Network 2.0 detecting client-side attacks the easy way Pawe Pawli nski CERT

New results on equiangular lines or How I caught a gold fish? Ferenc Szll osi

Character-Aware Neural Language Models Yoon Kim Yacine Jernite David Sontag Alexander Rush

Joint work with Marc Brockschmidt, Alex Gaunt, Alex Polozov, Patrick Fernandes, Mahmoud Khademi

Local methods for on-demand OOV word retrieval Stanislas Oger, Georges Linar` es, Fr ed

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Sch onbergs Theorem and Association Schemes Joint work with Brian Kodalen William J. Martin

The icosahedra of edge length 1 Daniel Robertz (j.w. K.-H. Brakhage, A. Niemeyer, W. Plesken, A.

PPI Network Alignment 02-715 Advanced Topics in Computa8onal Genomics

Detecting Unknown Network Attacks using Language Models Konrad - PowerPoint PPT Presentation

Detecting Unknown Network Attacks using Language Models Konrad Rieck and Pavel Laskov DIMVA 2006, July 13/14 Berlin, Germany The zero-day problem How to distinguish normal from unknown ? GET /dimva06/john/martin.html Accept: */*

Detecting Spammers and Content Detecting Spammers and Content Detecting Spammers and Content

12/6/2013 Detecting Fakes Image Forensics: Detecting Forged Photos 1.Detecting photorealistic

NetFlow Analysis: Detecting covert channels on the network Detecting malicious traffic by using

Detecting Sybil Attacks using Proofs of Work and Location for Vehicular AdHoc Networks (VANETS)

Wireless Security Wireless Network Attacks Access control attacks These attacks attempt to

Models of Language Evolution models thereof its evolution language Models of Language Evolution

ICANN 50 Detecting Distributed DNS Attacks Utilizing Levenshtein String Distances

Learning Nonstationary Models of Normal Network Traffic for Detecting Novel Attacks Matthew V.

Introduction Detecting Errors in Effects of Annotation Errors Detecting Errors in Corpus

VISION Why lemurs? Activity Intact opsins pattern ( * ) (unknown about others) ( *

4 Language Models 2: Log-linear Language Models This chapter will discuss another set of language

Generic Attacks on Stream Ciphers John Mattsson Generic Attacks on Stream Ciphers 2/22

Detecting Power Attacks on Reconfigurable Hardware Adrien Le Masle Wayne Luk Department of

Detecting network outages using different sources of data TMA Experts Summit, Paris, France

Preventing and detecting network attacks Harald Vranken 1 About me Open University &amp;

HoneySpider Network 2.0 detecting client-side attacks the easy way Pawe Pawli nski CERT

New results on equiangular lines or How I caught a gold fish? Ferenc Szll osi

Character-Aware Neural Language Models Yoon Kim Yacine Jernite David Sontag Alexander Rush

Joint work with Marc Brockschmidt, Alex Gaunt, Alex Polozov, Patrick Fernandes, Mahmoud Khademi

Local methods for on-demand OOV word retrieval Stanislas Oger, Georges Linar` es, Fr ed

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Sch onbergs Theorem and Association Schemes Joint work with Brian Kodalen William J. Martin

The icosahedra of edge length 1 Daniel Robertz (j.w. K.-H. Brakhage, A. Niemeyer, W. Plesken, A.

PPI Network Alignment 02-715 Advanced Topics in Computa8onal Genomics

Detecting Unknown Network Attacks using Language Models Konrad Rieck and Pavel Laskov DIMVA 2006, July 13/14 Berlin, Germany The zero-day problem How to distinguish normal from unknown ? GET /dimva06/john/martin.html Accept: /

Preventing and detecting network attacks Harald Vranken 1 About me Open University &