Robin Sommer
International Computer Science Institute, & Lawrence Berkeley National Laboratory
Advanced Topics in Computer Security UC Berkeley April 2010 robin@icsi.berkeley.edu http://www.icir.org
Outside the Closed World: On Finding Intrusions with Anomaly - - PowerPoint PPT Presentation
Outside the Closed World: On Finding Intrusions with Anomaly Detection Robin Sommer International Computer Science Institute, & Lawrence Berkeley National Laboratory robin@icsi.berkeley.edu http://www.icir.org Advanced Topics in Computer
International Computer Science Institute, & Lawrence Berkeley National Laboratory
Advanced Topics in Computer Security UC Berkeley April 2010 robin@icsi.berkeley.edu http://www.icir.org
2
3
3
HIDS
Host-based
+ High-level semantics + Performance + Deals with crypto ! Management hassle ! Must trust host
3
HIDS NIDS
Network-based
+ Easy setup, with broad coverage + Hard to subvert ! Packets lack context ! Performance ! Does not deal with crypto
Host-based
+ High-level semantics + Performance + Deals with crypto ! Management hassle ! Must trust host
4
5
alert tcp $EXTERNAL_NET any -> $HOME_NET 139 flow:to_server,established content:"|eb2f 5feb 4a5e 89fb 893e 89f2|" msg:"EXPLOIT x86 linux samba overflow" reference:bugtraq,1816 reference:cve,CVE-1999-0811 classtype:attempted-admin
6
Vern Paxson’s group at ICSI since 1996.
global ssh_hosts: set[addr]; event connection_established(c: connection) { local responder = c.id.resp_h; # Responder’s address local service = c.id.resp_p; # Responder’s port if ( service != 22/tcp ) return; # Not SSH. if ( responder in ssh_hosts ) return; # We already know this one. add ssh_hosts[responder]; # Found a new host. print "New SSH host found", responder; }
(1) Build a profile of normal activity (commonly offline). (2) Match activity against profile and report what deviates.
7
8
Source: Chandola et al. 2009
x y N1 N2
O3
Session Duration Session Volume
9
·
Technique Used Section References Statistical Profiling using Histograms Section 7.2.1 NIDES [Anderson et al. 1994; Anderson et al. 1995; Javitz and Valdes 1991], EMERALD [Porras and Neumann 1997], Yamanishi et al [2001; 2004], Ho et al. [1999], Kruegel at al [2002; 2003], Mahoney et al [2002; 2003; 2003; 2007], Sargor [1998] Parametric Statisti- cal Modeling Section 7.1 Gwadera et al [2005b; 2004], Ye and Chen [2001] Non-parametric Sta- tistical Modeling Section 7.2.2 Chow and Yeung [2002] Bayesian Networks Section 4.2 Siaterlis and Maglaris [2004], Sebyala et al. [2002], Valdes and Skinner [2000], Bronstein et al. [2001] Neural Networks Section 4.1 HIDE [Zhang et al. 2001], NSOM [Labib and Ve- muri 2002], Smith et al. [2002], Hawkins et al. [2002], Kruegel et al. [2003], Manikopoulos and Pa- pavassiliou [2002], Ramadas et al. [2003] Support Vector Ma- chines Section 4.3 Eskin et al. [2002] Rule-based Systems Section 4.4 ADAM [Barbara et al. 2001a; Barbara et al. 2003; Barbara et al. 2001b], Fan et al. [2001], Helmer et al. [1998], Qin and Hwang [2004], Salvador and Chan [2003], Otey et al. [2003] Clustering Based Section 6 ADMIT [Sequeira and Zaki 2002], Eskin et al. [2002], Wu and Zhang [2003], Otey et al. [2003] Nearest Neighbor based Section 5 MINDS [Ertoz et al. 2004; Chandola et al. 2006], Eskin et al. [2002] Spectral Section 9 Shyu et al. [2003], Lakhina et al. [2005], Thottan and Ji [2003],Sun et al. [2007] Information Theo- retic Section 8 Lee and Xiang [2001],Noble and Cook [2003]
Source: Chandola et al. 2009
Examples of techniques used for network intrusion detection.
Wang and Stolfo [2004]
9
·
Technique Used Section References Statistical Profiling using Histograms Section 7.2.1 NIDES [Anderson et al. 1994; Anderson et al. 1995; Javitz and Valdes 1991], EMERALD [Porras and Neumann 1997], Yamanishi et al [2001; 2004], Ho et al. [1999], Kruegel at al [2002; 2003], Mahoney et al [2002; 2003; 2003; 2007], Sargor [1998] Parametric Statisti- cal Modeling Section 7.1 Gwadera et al [2005b; 2004], Ye and Chen [2001] Non-parametric Sta- tistical Modeling Section 7.2.2 Chow and Yeung [2002] Bayesian Networks Section 4.2 Siaterlis and Maglaris [2004], Sebyala et al. [2002], Valdes and Skinner [2000], Bronstein et al. [2001] Neural Networks Section 4.1 HIDE [Zhang et al. 2001], NSOM [Labib and Ve- muri 2002], Smith et al. [2002], Hawkins et al. [2002], Kruegel et al. [2003], Manikopoulos and Pa- pavassiliou [2002], Ramadas et al. [2003] Support Vector Ma- chines Section 4.3 Eskin et al. [2002] Rule-based Systems Section 4.4 ADAM [Barbara et al. 2001a; Barbara et al. 2003; Barbara et al. 2001b], Fan et al. [2001], Helmer et al. [1998], Qin and Hwang [2004], Salvador and Chan [2003], Otey et al. [2003] Clustering Based Section 6 ADMIT [Sequeira and Zaki 2002], Eskin et al. [2002], Wu and Zhang [2003], Otey et al. [2003] Nearest Neighbor based Section 5 MINDS [Ertoz et al. 2004; Chandola et al. 2006], Eskin et al. [2002] Spectral Section 9 Shyu et al. [2003], Lakhina et al. [2005], Thottan and Ji [2003],Sun et al. [2007] Information Theo- retic Section 8 Lee and Xiang [2001],Noble and Cook [2003]
Source: Chandola et al. 2009
Examples of techniques used for network intrusion detection.
Wang and Stolfo [2004]
Table 3.1: Time-window based features Feature name Feature description count-dest Number of flows to unique destination IP addresses inside the network in the last seconds from the same source count-src Number of flows from unique source IP addresses inside the net- work in the last seconds to the same destination count-serv-src Number of flows from the source IP to the same destination port in the last seconds count-serv-dest Number of flows to the destination IP address using same source port in the last seconds MINDS
9
·
Technique Used Section References Statistical Profiling using Histograms Section 7.2.1 NIDES [Anderson et al. 1994; Anderson et al. 1995; Javitz and Valdes 1991], EMERALD [Porras and Neumann 1997], Yamanishi et al [2001; 2004], Ho et al. [1999], Kruegel at al [2002; 2003], Mahoney et al [2002; 2003; 2003; 2007], Sargor [1998] Parametric Statisti- cal Modeling Section 7.1 Gwadera et al [2005b; 2004], Ye and Chen [2001] Non-parametric Sta- tistical Modeling Section 7.2.2 Chow and Yeung [2002] Bayesian Networks Section 4.2 Siaterlis and Maglaris [2004], Sebyala et al. [2002], Valdes and Skinner [2000], Bronstein et al. [2001] Neural Networks Section 4.1 HIDE [Zhang et al. 2001], NSOM [Labib and Ve- muri 2002], Smith et al. [2002], Hawkins et al. [2002], Kruegel et al. [2003], Manikopoulos and Pa- pavassiliou [2002], Ramadas et al. [2003] Support Vector Ma- chines Section 4.3 Eskin et al. [2002] Rule-based Systems Section 4.4 ADAM [Barbara et al. 2001a; Barbara et al. 2003; Barbara et al. 2001b], Fan et al. [2001], Helmer et al. [1998], Qin and Hwang [2004], Salvador and Chan [2003], Otey et al. [2003] Clustering Based Section 6 ADMIT [Sequeira and Zaki 2002], Eskin et al. [2002], Wu and Zhang [2003], Otey et al. [2003] Nearest Neighbor based Section 5 MINDS [Ertoz et al. 2004; Chandola et al. 2006], Eskin et al. [2002] Spectral Section 9 Shyu et al. [2003], Lakhina et al. [2005], Thottan and Ji [2003],Sun et al. [2007] Information Theo- retic Section 8 Lee and Xiang [2001],Noble and Cook [2003]
Source: Chandola et al. 2009
Examples of techniques used for network intrusion detection.
Wang and Stolfo [2004]
Session Class Event Intensity Error Intensity Max Open to Any Host Service Distribution Number of Unique Ports Number of Unique IP Addresses Connect Code Distribution Figure 2: eBayes TCP
mail, http, ftp, rlogin,
password guessing, ...
eBayes/Emerald
9
·
Technique Used Section References Statistical Profiling using Histograms Section 7.2.1 NIDES [Anderson et al. 1994; Anderson et al. 1995; Javitz and Valdes 1991], EMERALD [Porras and Neumann 1997], Yamanishi et al [2001; 2004], Ho et al. [1999], Kruegel at al [2002; 2003], Mahoney et al [2002; 2003; 2003; 2007], Sargor [1998] Parametric Statisti- cal Modeling Section 7.1 Gwadera et al [2005b; 2004], Ye and Chen [2001] Non-parametric Sta- tistical Modeling Section 7.2.2 Chow and Yeung [2002] Bayesian Networks Section 4.2 Siaterlis and Maglaris [2004], Sebyala et al. [2002], Valdes and Skinner [2000], Bronstein et al. [2001] Neural Networks Section 4.1 HIDE [Zhang et al. 2001], NSOM [Labib and Ve- muri 2002], Smith et al. [2002], Hawkins et al. [2002], Kruegel et al. [2003], Manikopoulos and Pa- pavassiliou [2002], Ramadas et al. [2003] Support Vector Ma- chines Section 4.3 Eskin et al. [2002] Rule-based Systems Section 4.4 ADAM [Barbara et al. 2001a; Barbara et al. 2003; Barbara et al. 2001b], Fan et al. [2001], Helmer et al. [1998], Qin and Hwang [2004], Salvador and Chan [2003], Otey et al. [2003] Clustering Based Section 6 ADMIT [Sequeira and Zaki 2002], Eskin et al. [2002], Wu and Zhang [2003], Otey et al. [2003] Nearest Neighbor based Section 5 MINDS [Ertoz et al. 2004; Chandola et al. 2006], Eskin et al. [2002] Spectral Section 9 Shyu et al. [2003], Lakhina et al. [2005], Thottan and Ji [2003],Sun et al. [2007] Information Theo- retic Section 8 Lee and Xiang [2001],Noble and Cook [2003]
Source: Chandola et al. 2009
Examples of techniques used for network intrusion detection.
Wang and Stolfo [2004]
PAYL
9
·
Technique Used Section References Statistical Profiling using Histograms Section 7.2.1 NIDES [Anderson et al. 1994; Anderson et al. 1995; Javitz and Valdes 1991], EMERALD [Porras and Neumann 1997], Yamanishi et al [2001; 2004], Ho et al. [1999], Kruegel at al [2002; 2003], Mahoney et al [2002; 2003; 2003; 2007], Sargor [1998] Parametric Statisti- cal Modeling Section 7.1 Gwadera et al [2005b; 2004], Ye and Chen [2001] Non-parametric Sta- tistical Modeling Section 7.2.2 Chow and Yeung [2002] Bayesian Networks Section 4.2 Siaterlis and Maglaris [2004], Sebyala et al. [2002], Valdes and Skinner [2000], Bronstein et al. [2001] Neural Networks Section 4.1 HIDE [Zhang et al. 2001], NSOM [Labib and Ve- muri 2002], Smith et al. [2002], Hawkins et al. [2002], Kruegel et al. [2003], Manikopoulos and Pa- pavassiliou [2002], Ramadas et al. [2003] Support Vector Ma- chines Section 4.3 Eskin et al. [2002] Rule-based Systems Section 4.4 ADAM [Barbara et al. 2001a; Barbara et al. 2003; Barbara et al. 2001b], Fan et al. [2001], Helmer et al. [1998], Qin and Hwang [2004], Salvador and Chan [2003], Otey et al. [2003] Clustering Based Section 6 ADMIT [Sequeira and Zaki 2002], Eskin et al. [2002], Wu and Zhang [2003], Otey et al. [2003] Nearest Neighbor based Section 5 MINDS [Ertoz et al. 2004; Chandola et al. 2006], Eskin et al. [2002] Spectral Section 9 Shyu et al. [2003], Lakhina et al. [2005], Thottan and Ji [2003],Sun et al. [2007] Information Theo- retic Section 8 Lee and Xiang [2001],Noble and Cook [2003]
Source: Chandola et al. 2009
Examples of techniques used for network intrusion detection.
Wang and Stolfo [2004]
Features used packet sizes IP addresses ports header fields timestamps inter-arrival times session size session duration session volume payload frequencies payload tokens payload pattern ...
9
·
Technique Used Section References Statistical Profiling using Histograms Section 7.2.1 NIDES [Anderson et al. 1994; Anderson et al. 1995; Javitz and Valdes 1991], EMERALD [Porras and Neumann 1997], Yamanishi et al [2001; 2004], Ho et al. [1999], Kruegel at al [2002; 2003], Mahoney et al [2002; 2003; 2003; 2007], Sargor [1998] Parametric Statisti- cal Modeling Section 7.1 Gwadera et al [2005b; 2004], Ye and Chen [2001] Non-parametric Sta- tistical Modeling Section 7.2.2 Chow and Yeung [2002] Bayesian Networks Section 4.2 Siaterlis and Maglaris [2004], Sebyala et al. [2002], Valdes and Skinner [2000], Bronstein et al. [2001] Neural Networks Section 4.1 HIDE [Zhang et al. 2001], NSOM [Labib and Ve- muri 2002], Smith et al. [2002], Hawkins et al. [2002], Kruegel et al. [2003], Manikopoulos and Pa- pavassiliou [2002], Ramadas et al. [2003] Support Vector Ma- chines Section 4.3 Eskin et al. [2002] Rule-based Systems Section 4.4 ADAM [Barbara et al. 2001a; Barbara et al. 2003; Barbara et al. 2001b], Fan et al. [2001], Helmer et al. [1998], Qin and Hwang [2004], Salvador and Chan [2003], Otey et al. [2003] Clustering Based Section 6 ADMIT [Sequeira and Zaki 2002], Eskin et al. [2002], Wu and Zhang [2003], Otey et al. [2003] Nearest Neighbor based Section 5 MINDS [Ertoz et al. 2004; Chandola et al. 2006], Eskin et al. [2002] Spectral Section 9 Shyu et al. [2003], Lakhina et al. [2005], Thottan and Ji [2003],Sun et al. [2007] Information Theo- retic Section 8 Lee and Xiang [2001],Noble and Cook [2003]
Source: Chandola et al. 2009
Examples of techniques used for network intrusion detection.
Wang and Stolfo [2004]
GET /scripts/access.pl?user=johndoe&cred=admin GET /scripts/access.pl?user=johndoe;SELECT+passwd+FROM+credentials&...
Web-based Attacks
10
11
12
13
13
14
15
Feature X Feature Y
15
Feature X Feature Y
15
Feature X Feature Y
15
Feature X Feature Y
15
Feature X Feature Y
15
Feature X Feature Y
15
Feature X Feature Y
Classification Problems Spam Detection Optical Character Recognition Google’s Machine Translation Amazon’s Recommendations
16
Feature X Feature Y
16
Feature X Feature Y
17
18
100 200 300 400 500 600 700 800 900 1000 20000 40000 60000 Time Units, Unit = 100 Seconds (a) Packets/Time Unit
Source: LeLand et al. 1995
18
100 200 300 400 500 600 700 800 900 1000 20000 40000 60000 Time Units, Unit = 100 Seconds (a) Packets/Time Unit 100 200 300 400 500 600 700 800 900 1000 2000 4000 6000 Time Units, Unit = 10 Seconds (b) Packets/Time Unit
Source: LeLand et al. 1995
18
100 200 300 400 500 600 700 800 900 1000 20000 40000 60000 Time Units, Unit = 100 Seconds (a) Packets/Time Unit 100 200 300 400 500 600 700 800 900 1000 2000 4000 6000 Time Units, Unit = 10 Seconds (b) Packets/Time Unit 100 200 300 400 500 600 700 800 900 1000 200 400 600 800 Time Units, Unit = 1 Second (c) Packets/Time Unit
Source: LeLand et al. 1995
18
100 200 300 400 500 600 700 800 900 1000 20000 40000 60000 Time Units, Unit = 100 Seconds (a) Packets/Time Unit 100 200 300 400 500 600 700 800 900 1000 2000 4000 6000 Time Units, Unit = 10 Seconds (b) Packets/Time Unit 100 200 300 400 500 600 700 800 900 1000 200 400 600 800 Time Units, Unit = 1 Second (c) Packets/Time Unit 100 200 300 400 500 600 700 800 900 1000 20 40 60 80 100 Time Units, Unit = 0.1 Second (d) Packets/Time Unit
Source: LeLand et al. 1995
18
100 200 300 400 500 600 700 800 900 1000 20000 40000 60000 Time Units, Unit = 100 Seconds (a) Packets/Time Unit 100 200 300 400 500 600 700 800 900 1000 2000 4000 6000 Time Units, Unit = 10 Seconds (b) Packets/Time Unit 100 200 300 400 500 600 700 800 900 1000 200 400 600 800 Time Units, Unit = 1 Second (c) Packets/Time Unit 100 200 300 400 500 600 700 800 900 1000 20 40 60 80 100 Time Units, Unit = 0.1 Second (d) Packets/Time Unit 100 200 300 400 500 600 700 800 900 1000 5 10 15 Time Units, Unit = 0.01 Second (e) Packets/Time Unit
Source: LeLand et al. 1995
19
Connection size S [bytes] P(connection size > S) 1e−07 1e−05 0.001 0.1 1 1 10 100 1000 10000 1e+05 1e+06 1e+07 1e+08 1e+09 NERSC LBL MWN
Figure 1: Log-log CCDF of connection sizes
Cut-off 20KB
Source: Kornexl 2005
Site
Conns > 20KB %Bytes
MWN 15% 87% LBL 12% 96% NERSC 14% 99.86%
19
Connection size S [bytes] P(connection size > S) 1e−07 1e−05 0.001 0.1 1 1 10 100 1000 10000 1e+05 1e+06 1e+07 1e+08 1e+09 NERSC LBL MWN
Figure 1: Log-log CCDF of connection sizes
Cut-off 20KB
Source: Kornexl 2005
Site
Conns > 20KB %Bytes
MWN 15% 87% LBL 12% 96% NERSC 14% 99.86%
20 TBytes/month 1996 1998 2000 2002 2004 2006 2008 50 100 150 200 250 300 350 400 Total bytes Incoming bytes
Data:" Leibniz-Rechenzentrum, München
Munich Scientific Network
3 major universities, 10GE upstream ~100,000 Users ~65,000 Hosts
Total upstream bytes Incoming bytes
21
Total connections
Data: Lawrence Berkeley National Lab
Lawrence Berkeley National Lab
Medium-sized, 10GE upstream ~ 5,000 Users ~15,000 Hosts
#connections/month 1994 1996 1998 2000 2002 2004 2006 2008 0M 200M 400M 600M 800M 1000M 1300M
21
Total connections
Data: Lawrence Berkeley National Lab
Lawrence Berkeley National Lab
Medium-sized, 10GE upstream ~ 5,000 Users ~15,000 Hosts
#connections/month 1994 1996 1998 2000 2002 2004 2006 2008 0M 200M 400M 600M 800M 1000M 1300M
Attempted connections Successful connections
21
Total connections
Data: Lawrence Berkeley National Lab
Lawrence Berkeley National Lab
Medium-sized, 10GE upstream ~ 5,000 Users ~15,000 Hosts
#connections/month 1994 1996 1998 2000 2002 2004 2006 2008 0M 200M 400M 600M 800M 1000M 1300M
Conficker.B Conficker.A Santy Mydoom.O Sasser Sobig.F Welchia Blaster Slapper Nimda CodeRed2 CodeRed
Attempted connections Successful connections
log(port) 1994 1996 1998 2000 2002 2004 2006 2008 2010
20 25 53 80 110 135 443 513 1080 1433 3128 5432 8000 9898
All connections.
22
Data:" Lawrence Berkeley National Lab
log(port) 1994 1996 1998 2000 2002 2004 2006 2008 2010
20 25 53 80 110 135 443 513 1080 1433 3128 5432 8000 9898
Successful connections.
22
Data:" Lawrence Berkeley National Lab
23
active- connection-reuse DNS-label-len-gt- pkt HTTP-chunked- multipart possible-split- routing bad-Ident-reply DNS-label-too- long HTTP-version- mismatch SYN-after-close bad-RPC DNS-RR-length- mismatch illegal-%-at-end-
SYN-after-reset bad-SYN-ack DNS-RR-unknown- type inappropriate-FIN SYN-inside- connection bad-TCP-header- len DNS-truncated- answer IRC-invalid-line SYN-seq-jump base64-illegal- encoding DNS-len-lt-hdr- len line-terminated- with-single-CR truncated-NTP
connection-
DNS-truncated-RR- rdlength malformed-SSH- identification unescaped-%-in- URI data-after-reset double-%-in-URI no-login-prompt unescaped- special-URI-char data-before- established excess-RPC NUL-in-line unmatched-HTTP- reply too-many-DNS- queries FIN-advanced- last-seq
POP3-server-sending- client-commands
window-recision DNS-label- forward-compress-
fragment-with-DF
155K in total!
24
25
26
Greg Linden: “ ... guess work. Our error rate will always be high.”
27
28
Source: Axelsson 1999
28
P (P )
Source: Axelsson 1999
28
P (S)∗P (P |S) P (S)∗P (P |S)+P (¬S)∗P (P |¬S)
P (P )
Source: Axelsson 1999
28
P (S)∗P (P |S) P (S)∗P (P |S)+P (¬S)∗P (P |¬S)
P (P )
1 10000 ∗0.99 1 10000 ∗0.99+(1− 1 10000 )∗0.01 ≈ 1%
Source: Axelsson 1999
29
Source: Axelsson 1999
1M audit records / day 2 intrusions / day 10 records / intrusion
2∗10 = 2 ∗ 10−5
29
Source: Axelsson 1999
P (I)∗P (A|I) P (I)∗P (A|I)+P (¬I)∗P (A|¬I) 1M audit records / day 2 intrusions / day 10 records / intrusion
2∗10 = 2 ∗ 10−5
29
Source: Axelsson 1999
P (I)∗P (A|I) P (I)∗P (A|I)+P (¬I)∗P (A|¬I)
2∗10−5∗P (A|I) 2∗10−5∗P (A|I)+0.99998∗P (A|¬I) 1M audit records / day 2 intrusions / day 10 records / intrusion
2∗10 = 2 ∗ 10−5
29
Source: Axelsson 1999
P (I)∗P (A|I) P (I)∗P (A|I)+P (¬I)∗P (A|¬I)
2∗10−5∗P (A|I) 2∗10−5∗P (A|I)+0.99998∗P (A|¬I)
Detection rate
1M audit records / day 2 intrusions / day 10 records / intrusion
2∗10 = 2 ∗ 10−5
29
Source: Axelsson 1999
P (I)∗P (A|I) P (I)∗P (A|I)+P (¬I)∗P (A|¬I)
2∗10−5∗P (A|I) 2∗10−5∗P (A|I)+0.99998∗P (A|¬I)
Detection rate
1M audit records / day 2 intrusions / day 10 records / intrusion
2∗10 = 2 ∗ 10−5
False alarm rate
29
Source: Axelsson 1999
P (I)∗P (A|I) P (I)∗P (A|I)+P (¬I)∗P (A|¬I)
2∗10−5∗P (A|I) 2∗10−5∗P (A|I)+0.99998∗P (A|¬I)
Detection rate
1M audit records / day 2 intrusions / day 10 records / intrusion
2∗10 = 2 ∗ 10−5
Even with perfect detection, the false alarm rate must be
False alarm rate
30
!"#$% P.:340 + <#.'% <#.'% 40#3% @",%C$#'.8.-% )48% K40'%C$#'.8.-% )48% 4?Q?4? 4?Q?4? 7%' &'$ ()$*+ ,)-.*/)-)"0 ()$*+ 12.*"3*"
4 &'$. @#840 R P.:340 S-%38' 4 ,)-.* 5-)"0. @#840 R <#.'% S-%38'
6*7'.'89 K40'%C3%:48.-% )48% @",%C3%:48.-% )48% 4?Q?4? 4?Q?4? <# :'.. ()$*+ ;8""*7$/"*<= ()$*+
> / 4 &'$. @#840 R P.:340 S-%38' > / 4 ,)-.* 5-)"0. @#840 R <#.'% S-%38'
Source: Maxion/Roberts 2004
31
Source: Maxion/Roberts 2004
!"#$%&'($)*)+%&,"*%
/.- 89:%&'($)*)+%&,"*%
/.-
!"#
31
!"#$%&'($)*)+%&,"*%
/.- 89:%&'($)*)+%&,"*%
/.-
!$#
Source: Maxion/Roberts 2004
!"#$%&'($)*)+%&,"*%
/.- 89:%&'($)*)+%&,"*%
/.-
!"#
31
!"#$%&'($)*)+%&,"*%
/.- 89:%&'($)*)+%&,"*%
/.-
!$#
!"#$"%&'(%%)#*%+ ,-./'(%%)#*%+ 012'(%%)#*%+ (%%)#*%+'3)"'41'5/*6%" 7*89"'!19&-:"';*&" <=> <=? <=@ <=A <=B <=C <=D <=E <=F >=< 4#)"'!19-&-:"';*&" <=> <=? <=@ <=A <=B <=C <=D <=E <=F >=< <=<
657
Source: Maxion/Roberts 2004
!"#$%&'($)*)+%&,"*%
/.- 89:%&'($)*)+%&,"*%
/.-
!"#
31
!"#$%&'($)*)+%&,"*%
/.- 89:%&'($)*)+%&,"*%
/.-
!$#
!"#$"%&'(%%)#*%+ ,-./'(%%)#*%+ 012'(%%)#*%+ (%%)#*%+'3)"'41'5/*6%" 7*89"'!19&-:"';*&" <=> <=? <=@ <=A <=B <=C <=D <=E <=F >=< 4#)"'!19-&-:"';*&" <=> <=? <=@ <=A <=B <=C <=D <=E <=F >=< <=<
657
!"#$%&' !"#$%&( !"#$%&! !"#$%&'($)*)+%&,"*%
/.- 89:%&'($)*)+%&,"*%
/.-
!"#
Source: Maxion/Roberts 2004
!"#$%&'($)*)+%&,"*%
/.- 89:%&'($)*)+%&,"*%
/.-
!"#
Unit of analysis ! How do we count “noise events” and “signal events”? Ground truth ! How do we know/define what is a true attack? Evaluation data set ! How do we know that it is representative?
32
An attempt at providing the community with public data sets for fair NIDS evaluation
33
An attempt at providing the community with public data sets for fair NIDS evaluation
33
INSIDE INSIDE ( (Eyrie Eyrie AF Base) AF Base)
SunOS Solaris SolarisNT NT
CISCO ROUTER AUDIT DATA
SNIFFER DATA FILE SYSTEM DUMPS Linux INSIDE SNIFFER OUTSIDE SNIFFER
100’S OF EMULATED PC’S AND WORKSTATIONS 1000’S OF EMULATED WORKSTATIONS AND WEB SITES
OUTSIDE OUTSIDE (Internet) (Internet) Figure 1. Block diagram of 1999 test bed.
Source: Lippmann et. al. 1999
An attempt at providing the community with public data sets for fair NIDS evaluation
33
INSIDE INSIDE ( (Eyrie Eyrie AF Base) AF Base)
SunOS Solaris SolarisNT NT
CISCO ROUTER AUDIT DATA
SNIFFER DATA FILE SYSTEM DUMPS Linux INSIDE SNIFFER OUTSIDE SNIFFER
100’S OF EMULATED PC’S AND WORKSTATIONS 1000’S OF EMULATED WORKSTATIONS AND WEB SITES
OUTSIDE OUTSIDE (Internet) (Internet) Figure 1. Block diagram of 1999 test bed.
Source: Lippmann et. al. 1999
An attempt at providing the community with public data sets for fair NIDS evaluation
33
INSIDE INSIDE ( (Eyrie Eyrie AF Base) AF Base)
SunOS Solaris SolarisNT NT
CISCO ROUTER AUDIT DATA
SNIFFER DATA FILE SYSTEM DUMPS Linux INSIDE SNIFFER OUTSIDE SNIFFER
100’S OF EMULATED PC’S AND WORKSTATIONS 1000’S OF EMULATED WORKSTATIONS AND WEB SITES
OUTSIDE OUTSIDE (Internet) (Internet) Figure 1. Block diagram of 1999 test bed.
Source: Lippmann et. al. 1999
34
35
36
37
37
38
39
40
40
40
40
41
0800 "GET /scripts/access.pl?user=johndoe&cred=admin" 200 2122
path a = v a = v
1 1 2 2
q
Source: Kruegel et al. 2003
Attribute Models Length Character distribution Grammatical structure Tokens Presence/Absence Order of attributes
41
Figure 2. Overview of real-time URL feed, feature collection, an
Source: Ma et al. 2009
42
Cyber-TA Anonymous Infection Profile Publication Repository
TLS/TOR
e2: Exploits e3: Egg Downloads e4: C&C Traffic
Snort 2.6.* SCADE
Span Port to Ethernet Device
botHunter Ruleset Signature Engine
Anomaly Engine
SLADE
Anomaly Engine
e2: Payload Anomalies e1: Inbound Malware Scans e5: Outbound Scans
botHunter
Correlator
CTA Anonymizer Plugin
Java 1.4.2
bothunter.config bothunter.XML
C T A P A S R N S O E R R T
bot Infection Profile:
Source: Gu et al. 2007
42
Update Address Space Regular Service Code Shadow Honeypot Code Predictors Protected Service Update filters
Protected System
User processes OS Kernel Traffic from the network
Filtering Process State
State Rollback
Anomaly Detection Sensors
Source: Anagnostakis et al. 2005
43
44
44
44
44
44
45
46
47
48
49
49
International Computer Science Institute, & Lawrence Berkeley National Laboratory
robin@icsi.berkeley.edu http://www.icir.org
International Computer Science Institute, & Lawrence Berkeley National Laboratory
robin@icsi.berkeley.edu http://www.icir.org