Detecting Unknown Network Attacks using Language Models Konrad - - PowerPoint PPT Presentation
Detecting Unknown Network Attacks using Language Models Konrad - - PowerPoint PPT Presentation
Detecting Unknown Network Attacks using Language Models Konrad Rieck and Pavel Laskov DIMVA 2006, July 13/14 Berlin, Germany The zero-day problem How to distinguish normal from unknown ? GET /dimva06/john/martin.html Accept: */*
The zero-day problem
- How to distinguish normal from unknown?
GET /scripts/..%%35c../..%%35c../..%%35c../..%%35c %%35c../winnt/system32/cmd.exe?/c+dir+c:\ HTTP/1.0 Host: www Connection: close
- Cast intrusion detection into linguistic problem
- Utilization of machine learning instruments
GET /dimva06/john/martin.html Accept: */* Accept-Language: en Host: www Connection: keep-alive
N-gram models
ge et t▯
▯/
in nd
⋯
/i
2-grams
get▯/index.html
Connection payload
g e t
▯
i n
⋯
/
Bytes
get et▯ t▯/
▯/i
ind nde
⋯
/in
3-grams ⋯ n-grams
N-grams in attacks
!0.02 !0.01 0.01 0.02 0.03 0.04 0.05 4!grams frequency difference Nimda IIS attack and HTTP traffic comparison
GET /scripts/..%%35c../..%%35c../..%%35c../..%%35c %%35c../winnt/system32/cmd.exe?/c+dir+c:\ HTTP/1.0
Frequency differences to 4-grams in normal HTTP
%%35
- 35c. 5c..
c../ Acce cept
Geometric representation
- A simple example
0.005 0.01 0.05 0.005 0.01 0.015
Acce GET▯
GET▯ Acce %%35
0.005 0.01 0.05 0.005 0.01 0.015
%%35
HTTP pipelining Similarity of connections
- Huge feature space
- 256n dimensions
- Geometric representation of connections
Similarity measures
- Distances, kernel functions, ... e.g.
k
- w∈L |φw(x) − φw(y)|k
- Minkowski
- w∈L |φw(x) − φw(y)|
- Manhattan
- Efficient computation not trivial
- Sparse representation of n-gram frequencies
- Linear-time algorithms (cf. DIMVA 2006 paper)
x, y ∈ {0, . . . , 255}∗, L = {0, . . . , 255}n
φw(x) =
frequency of w in sequence x
Anomaly detection
- Detection of outliers in feature space
- Exploration of geometry between connections
- No training phase - no labels required
- Anomaly detection (AD) methods
- e.g. Spherical AD, Cluster AD, Neighborhood AD
0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Toy data 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Toy data 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Toy data 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Spherical anomaly detection 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Cluster anomaly detection 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Neighborhood anomaly detection
Experiments
- Open questions
- Do n-gram models capture semantics sufficient
for detection of unknown attacks?
- Can anomaly detection reliably operate at low
false-positive rates?
- How does this approach compare to classical
signature-based intrusion detection?
Evaluation data
- PESIM 2005 data set
- Real network traffic to servers at our laboratory
- HTTP
Reverse proxies of web sites
- FTP
Local file sharing, e.g. photos, media
- SMTP
Retransmission flavored with spam
- Attacks injected by pentest expert (e.g. metasploit)
- DARPA 1999 data set as reference
- Statistical preprocessing
- Extraction of 30 independent samples comprising
1000 incoming connection payloads per protocol
Method comparison
- Comparison of anomaly detection methods
- Criteria: AUC0.01 - Area under ROC within [0, 0.01]
- Results averaged over n-gram lengths [1,7]
Protocol Best method AUC0.01 HTTP Spherical (qsSVM) 0.781 FTP Neighborhood (Zeta) 0.746 SMTP Cluster (Single-linkage) 0.756 Bottom line: Different protocols require different anomaly detection methods
N-gram lengths
- How does one choose the optimal n-gram length?
0% 10% 20% 30% 40% 1 2 3 4 5 6 7 HTTP FTP SMTP
Optimal n-gram length per attack
- No single n fits all: variable-length models required
Variable-length models
index get html
Words
get▯/index.html
Connection payload
CR LF TAB ▯
, . : / &
Combined n-grams
g e t
▯
i n
⋯
/ ge et t▯
▯/
in nd
⋯
/i get et▯ t▯/
▯/i
ind nde
⋯
/in n = {1,2,3,...}
Comparison with Snort
- Language models vs. Snort
- Combined n-gram (1-7) and word models
- Snort: Version 2.4.2 with default rules
0.002 0.004 0.006 0.008 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 false positive rate true positive rate HTTP traffic Best combined Words Snort 0.002 0.004 0.006 0.008 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 false positive rate true positive rate FTP traffic Best combined Words Snort 0.002 0.004 0.006 0.008 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 false positive rate true positive rate SMTP traffic Best combined Words Snort
Conclusions and outlook
- Language models for intrusion detection
- Characteristic patterns in normal traffic and attacks
- Unsupervised nomaly detection with high accuracy
- Detection of ~80% unknown network attacks
- Future perspective
- From in vitro to in vivo: real-time application
- Language models as prototypes for signatures?
Outwit language models
- Approaches
- Red herring
Denial-of-service with random traffic patterns
- Creeping poisoning
Careful subversion of normal traffic model
- Mimicry attacks
Adaption of attacks to mimicry normal traffic
- Conclusions
- (1) Worse for signature-based intrusion detection
- (2,3) Requires profound insider knowlegde