 
              Detecting Unknown Network Attacks using Language Models Konrad Rieck and Pavel Laskov DIMVA 2006, July 13/14 Berlin, Germany
The zero-day problem ‣ How to distinguish normal from unknown ? GET /dimva06/john/martin.html Accept: */* Accept-Language: en Host: www Connection: keep-alive GET /scripts/..%%35c../..%%35c../..%%35c../..%%35c %%35c../winnt/system32/cmd.exe?/c+dir+c:\ HTTP/1.0 Host: www Connection: close ‣ Cast intrusion detection into linguistic problem ‣ Utilization of machine learning instruments
N-gram models Connection payload get ▯ /index.html g e ge et get et ▯ ⋯ t / t ▯ ▯ / /i t ▯ / ▯ /i /in ▯ n-grams i n in nd ind nde ⋯ ⋯ ⋯ Bytes 2-grams 3-grams
N-grams in attacks GET /scripts/..%%35c../..%%35c../..%%35c../..%%35c %%35c../winnt/system32/cmd.exe?/c+dir+c:\ HTTP/1.0 Frequency differences to 4-grams in normal HTTP Nimda IIS attack and HTTP traffic comparison %%35 35c. 5c.. c../ 0.05 0.04 frequency difference 0.03 0.02 0.01 0 ! 0.01 Acce cept ! 0.02 4 ! grams
Geometric representation ‣ A simple example 0.015 0.015 GET ▯ HTTP pipelining 0.01 0.01 Acce Acce %%35 0.005 0.005 Similarity of connections 0 0 ‣ Huge feature space 0 0 0 0 0.005 0.005 %%35 0.01 0.01 ‣ 256 n dimensions 0.05 0.05 GET ▯ ‣ Geometric representation of connections
Similarity measures ‣ Distances, kernel functions, ... e.g. w ∈ L | φ w ( x ) − φ w ( y ) | ‣ Manhattan � k �� w ∈ L | φ w ( x ) − φ w ( y ) | k ‣ Minkowski x , y ∈ { 0 , . . . , 255 } ∗ , L = { 0 , . . . , 255 } n frequency of w in sequence x φ w ( x ) = ‣ Efficient computation not trivial ‣ Sparse representation of n-gram frequencies ‣ Linear-time algorithms (cf. DIMVA 2006 paper)
Anomaly detection ‣ Detection of outliers in feature space ‣ Exploration of geometry between connections ‣ No training phase - no labels required ‣ Anomaly detection (AD) methods ‣ e.g. Spherical AD, Cluster AD, Neighborhood AD Spherical anomaly detection Toy data Cluster anomaly detection Toy data Neighborhood anomaly detection Toy data 1 1 1 1 1 1 0.8 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.2 0 0 0 0 0 0 0 0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Experiments ‣ Open questions ‣ Do n-gram models capture semantics sufficient for detection of unknown attacks? ‣ Can anomaly detection reliably operate at low false-positive rates? ‣ How does this approach compare to classical signature-based intrusion detection?
Evaluation data ‣ PESIM 2005 data set ‣ Real network traffic to servers at our laboratory ‣ HTTP Reverse proxies of web sites ‣ FTP Local file sharing, e.g. photos, media ‣ SMTP Retransmission flavored with spam ‣ Attacks injected by pentest expert (e.g. metasploit) ‣ DARPA 1999 data set as reference ‣ Statistical preprocessing ‣ Extraction of 30 independent sample s comprising 1000 incoming connection payloads per protocol
Method comparison ‣ Comparison of anomaly detection methods ‣ Criteria: AUC 0.01 - Area under ROC within [0, 0.01] ‣ Results averaged over n-gram lengths [1,7] Protocol Best method AUC 0.01 Spherical (qsSVM) HTTP 0.781 Neighborhood (Zeta) FTP 0.746 Cluster (Single-linkage) SMTP 0.756 Bottom line: Different protocols require different anomaly detection methods
N-gram lengths ‣ How does one choose the optimal n-gram length? HTTP FTP SMTP 40% 30% 20% 10% 0% 1 2 3 4 5 6 7 Optimal n-gram length per attack ‣ No single n fits all: variable-length models required
Variable-length models Connection payload get ▯ /index.html n = CR LF TAB ▯ {1,2,3,...} , . : / & g e get get et ▯ t / ▯ t ▯ / ▯ /i /in index i n ge et ind nde ⋯ t ▯ ▯ / /i html ⋯ in nd ⋯ Combined n-grams Words
Comparison with Snort ‣ Language models vs. Snort ‣ Combined n-gram (1-7) and word models ‣ Snort: Version 2.4.2 with default rules HTTP traffic FTP traffic SMTP traffic 1 1 1 0.9 0.9 0.9 0.8 0.8 0.8 0.7 0.7 0.7 true positive rate true positive rate true positive rate 0.6 0.6 0.6 0.5 0.5 0.5 0.4 0.4 0.4 0.3 0.3 0.3 0.2 0.2 0.2 Best combined Best combined Best combined Words Words Words 0.1 0.1 0.1 Snort Snort Snort 0 0 0 0 0.002 0.004 0.006 0.008 0.01 0 0.002 0.004 0.006 0.008 0.01 0 0.002 0.004 0.006 0.008 0.01 false positive rate false positive rate false positive rate
Conclusions and outlook ‣ Language models for intrusion detection ‣ Characteristic patterns in normal traffic and attacks ‣ Unsupervised nomaly detection with high accuracy ‣ Detection of ~80% unknown network attacks ‣ Future perspective ‣ From in vitro to in vivo: real-time application ‣ Language models as prototypes for signatures?
Outwit language models ‣ Approaches ‣ Red herring Denial-of-service with random traffic patterns ‣ Creeping poisoning Careful subversion of normal traffic model ‣ Mimicry attacks Adaption of attacks to mimicry normal traffic ‣ Conclusions ‣ (1) Worse for signature-based intrusion detection ‣ (2,3) Requires profound insider knowlegde
Questions?
Recommend
More recommend