empirical analysis and statistical modelling of attack
play

Empirical Analysis and Statistical Modelling of Attack Processes - PowerPoint PPT Presentation

Empirical Analysis and Statistical Modelling of Attack Processes based on Honeypots M. Kaniche 1 , E. Alata 1 , V.Nicomette 1 , Y. Deswarte 1 , M. Dacier 2 LAAS-CNRS 1 , Eurecom 2 Mohamed.Kaaniche@laas.fr ACI Scurit &


  1. Empirical Analysis and Statistical Modelling of Attack Processes based on Honeypots M. Kaâniche 1 , E. Alata 1 , V.Nicomette 1 , Y. Deswarte 1 , M. Dacier 2 LAAS-CNRS 1 , Eurecom 2 Mohamed.Kaaniche@laas.fr ACI “Sécurité & Informatique” http://acisi.loria.fr Workshop on Empirical Evaluation of Dependability and Security (WEEDS-DSN06), Philadelphia, PA, June 28, 2006

  2. Outline  Context and motivation  Data collection  Attack processes modeling  Conclusion and open issues

  3. Context  Need for real data and methodologies to learn about malicious activities on the Internet and analyze their impact on systems security  Several initiatives for monitoring malicious threats do exist ■ CAIDA ■ Motion Sensor project ■ Dshield ■ CADHo

  4. CADHo Objectives  Build and deploy on the Internet a distributed platform of identically configured low-interaction honeypots in a large number of diverse locations  Carry out various analyses based on the collected data to better understand threats and build models to characterize attack processes  Analyze and model the behavior of malicious attackers once they manage to get access and compromise a target ■ High-interaction honeypots

  5. Leurré.com data collection platform R Mach0 e Windows 98 v Workstation e V r i Mach1 r s t Windows NT (ftp u e + web server) Internet a l F S I w Mach2 i r Redhat 7.3 (ftp t e c server) h w a l l Observer (tcpdump)

  6. 35 platforms, 25 countries, 5 continents

  7. Data analysis  Data collection since 2004  80 000 different IP addresses from 91 different countries  Information extracted from the logs ■ Raw packets (entire frames including payloads) ■ IP address of the attacking machine ■ Time of the attack and duration ■ Targeted virtual machines and ports ■ Geographic location of the attacking machine ( Maxmind, NetGeo ) ■ Os of the attacking machine ( p0f, ettercap, disco )  Automatic data analyses have been developed to extract useful trends and identify hidden phenomena from the data ■ Clustering techniques, Time series analysis, etc. ■ Publications available at: www.leurrecom.org/paper.htm

  8. Modeling Objectives  Identify probability distributions that best characterize attack occurrence and attack propagation processes  Model the time relationships between attacks coming from different sources (or to different destinations)  Analyze whether data collected from different platforms exhibit similar or different malicious attack activities  Predict occurrence of new attacks on a given platform based on past observations on this platform and other platforms  Estimate impact of attacks on security of target systems ■ High-interaction honeypots to analyze attackers behavior once they compromise and get access to a target

  9. Examples  Analysis of the time evolution of the number of attacks taking into account the geographic location of attacking machines  Characterization and statistical modeling of times between attacks  Analysis of the propagation of attacks among the honeypot platforms  Data ■ 320 days from January 1st 2004 to April 17, 2005 ■ 14 honeypot platforms (the most active ones) ■ 816475 observed attacks

  10. Attack occurrence and geographic distrib. The number of attacks per unit of R 2 α j β j time, considering a single platform or all platforms, can be described as a Russia 44.57 1555.67 0.93 linear regression of the attacks USA 5.13 759.1 0.94 originating from a single country only UK 25.93 438.03 0.94 Y(t) = α j X j (t) + β j

  11. “Times between attacks” analysis  An attack is associated to an IP address ■ occurrence time associated to the first time a packet is received from the corresponding address  t i = time between attacks i and ( i-1 ) P5 P6 P9 P20 P23 #ti 85890 148942 46268 224917 51580 #IP 79549 90620 42230 162156 47859

  12. Number of attacks per IP address

  13. “Times between attacks” distribution  Best fit provided by a mixture distribution k � � t pdf ( t ) = P a k + 1 + (1 � P a ) � e ( t + 1) 0.025 Pa = 0.0115 k = 0.1183 0.020 λ = 0.1364/sec. Data 0.015 pdf Mixture (Pareto, Exp.) 0.010 Exponential 0.005 0.000 1 31 61 91 121 151 181 211 241 271 Time between attacks Platform 6

  14. “Times between attacks” distribution 0.03 0.08 0.03 0.07 P a = 0.0051 k = 0.173 Pa = 0.0019 0.06 λ = 0.121/sec. 0.02 k = 0.1668 λ = 0.276/sec. 0.05 pdf pdf 0.02 0.04 0.01 0.03 Mixture (Pareto, Exp.) Data Mixture (Pareto, Exp.) 0.02 Exponential 0.01 Data Exponential 0.01 0.00 Time (sec.) 0.00 1 31 61 91 121 151 181 211 241 271 1 31 61 91 121 151 181 211 241 271 Time between attacks Time between attacks Platform 5 Platform 9 0.06 0.02 0.01 0.05 Pa = 0.0144 Data 0.01 Pa = 0.0031 k = 0.0183 Mixture (Pareto, Exp.) λ = 0.0136/sec. k = 0.1240 0.01 0.04 λ = 0.275/sec. pdf 0.01 pdf 0.03 0.01 Exponential 0.00 0.02 Mixture (Pareto, Exp.) 0.00 0.01 Exponential Data 0.00 1 31 61 91 121 151 181 211 241 271 0.00 Time between attacks 1 31 61 91 121 151 181 211 241 271 Time between attacks Platform 20 Platform 23

  15. Propagation of attacks  A Propagation is assumed to occur when an IP address of an attacking machine observed at a given platform is observed at another platform  Propagation graph ■ Nodes identify the platforms ■ Transitions identify propagations  A propagation between Pi and Pj occurs from an IP address when the next occurrence of this address is observed on Pj after visiting Pi ■ Probabilities are associated to the transitions to reflect their likelihood of occurrence

  16. Propagation graph 4.3% 15.1% 15.1% 96.1% 96.1% 8.1% P6 43.2% 1.1% 1% P5 1.4% 11.3% 11.3% 12.6% 1.37% 95.5% 95.5% P20 59% 59% 0.9% 15.4% 15.4% 29% 29% 0.6% 3.7% 1.35% 0.6% P9 P23 2.7% 4.1% 54.1% 30.3%  Issues under investigation ■ Focus on specific attacks (largest clusters, worms, etc.) ■ Timing characteristics and probability distributions

  17. Summary and Conclusions  Preliminary models to characterize attack processes observed on low-interaction honeypots  Several open issues ■ Predictive models that can be used to support decision making during design and operation stages ■ How to assess the impact of attacks on the security of target systems?  High-interaction honeypots ■ Analyze attackers behavior once they get access to a target ■ Validate a theoretical model for quantitative evaluation of security developed by LAAS in the 90’s  Privilege graph to describe vulnerabilities and attack scenarios  METF “Mean Effort To security Failure” to quantify security  Assumptions about intruders behaviors

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend