Empirical Analysis and Statistical Modelling of Attack Processes - - PowerPoint PPT Presentation

empirical analysis and statistical modelling of attack
SMART_READER_LITE
LIVE PREVIEW

Empirical Analysis and Statistical Modelling of Attack Processes - - PowerPoint PPT Presentation

Empirical Analysis and Statistical Modelling of Attack Processes based on Honeypots M. Kaniche 1 , E. Alata 1 , V.Nicomette 1 , Y. Deswarte 1 , M. Dacier 2 LAAS-CNRS 1 , Eurecom 2 Mohamed.Kaaniche@laas.fr ACI Scurit &


slide-1
SLIDE 1

Empirical Analysis and Statistical Modelling of Attack Processes based on Honeypots

  • M. Kaâniche1, E. Alata1, V.Nicomette1, Y. Deswarte1, M. Dacier2

Workshop on Empirical Evaluation of Dependability and Security (WEEDS-DSN06), Philadelphia, PA, June 28, 2006

LAAS-CNRS1, Eurecom2

Mohamed.Kaaniche@laas.fr

ACI “Sécurité & Informatique” http://acisi.loria.fr

slide-2
SLIDE 2

Outline

 Context and motivation  Data collection  Attack processes modeling  Conclusion and open issues

slide-3
SLIDE 3

Context

 Need for real data and methodologies to learn about malicious activities on the Internet and analyze their impact on systems security  Several initiatives for monitoring malicious threats do exist

■ CAIDA ■ Motion Sensor project ■ Dshield ■ CADHo

slide-4
SLIDE 4

CADHo Objectives

 Build and deploy on the Internet a distributed platform of identically configured low-interaction honeypots in a large number of diverse locations  Carry out various analyses based on the collected data to better understand threats and build models to characterize attack processes  Analyze and model the behavior of malicious attackers once they manage to get access and compromise a target ■ High-interaction honeypots

slide-5
SLIDE 5

Leurré.com data collection platform

Mach0 Windows 98 Workstation Mach1 Windows NT (ftp + web server) Mach2 Redhat 7.3 (ftp server)

V i r t u a l S w i t c h

Internet

Observer (tcpdump)

R e v e r s e F I r e w a l l

slide-6
SLIDE 6

35 platforms, 25 countries, 5 continents

slide-7
SLIDE 7

Data analysis

 Data collection since 2004

  • 80 000 different IP addresses from 91 different countries

 Information extracted from the logs

■ Raw packets (entire frames including payloads) ■ IP address of the attacking machine ■ Time of the attack and duration ■ Targeted virtual machines and ports ■ Geographic location of the attacking machine (Maxmind, NetGeo) ■ Os of the attacking machine (p0f, ettercap, disco)

 Automatic data analyses have been developed to extract useful trends and identify hidden phenomena from the data

■ Clustering techniques, Time series analysis, etc. ■ Publications available at: www.leurrecom.org/paper.htm

slide-8
SLIDE 8

Modeling Objectives

 Identify probability distributions that best characterize attack occurrence and attack propagation processes  Model the time relationships between attacks coming from different sources (or to different destinations)  Analyze whether data collected from different platforms exhibit similar or different malicious attack activities  Predict occurrence of new attacks on a given platform based on past observations on this platform and other platforms  Estimate impact of attacks on security of target systems

■ High-interaction honeypots to analyze attackers behavior

  • nce they compromise and get access to a target
slide-9
SLIDE 9

Examples

 Analysis of the time evolution of the number of attacks taking into account the geographic location of attacking machines  Characterization and statistical modeling of times between attacks  Analysis of the propagation of attacks among the honeypot platforms  Data ■ 320 days from January 1st 2004 to April 17, 2005 ■ 14 honeypot platforms (the most active ones) ■ 816475 observed attacks

slide-10
SLIDE 10

The number of attacks per unit of time, considering a single platform or all platforms, can be described as a linear regression of the attacks

  • riginating from a single country only

Y(t) = αj Xj(t) + βj

0.94 438.03 25.93

UK

0.94 759.1 5.13

USA

0.93 1555.67 44.57

Russia R2 βj αj

Attack occurrence and geographic distrib.

slide-11
SLIDE 11

“Times between attacks” analysis

 An attack is associated to an IP address

■ occurrence time associated to the first time a packet is received from the corresponding address

 ti = time between attacks i and (i-1)

47859 162156 42230 90620 79549 #IP 51580 224917 46268 148942 85890 #ti P23 P20 P9 P6 P5

slide-12
SLIDE 12

Number of attacks per IP address

slide-13
SLIDE 13

“Times between attacks” distribution

0.000 0.005 0.010 0.015 0.020 0.025 1 31 61 91 121 151 181 211 241 271

Time between attacks pdf

Pa = 0.0115 k = 0.1183 λ = 0.1364/sec. Mixture (Pareto, Exp.) Data Exponential

pdf (t) = Pa k (t +1)

k+1 + (1 Pa)e t

 Best fit provided by a mixture distribution

Platform 6

slide-14
SLIDE 14

Platform 20 Platform 23 Platform 5 Platform 9

“Times between attacks” distribution

0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 1 31 61 91 121 151 181 211 241 271

Time between attacks pdf Data Mixture (Pareto, Exp.) Exponential Pa = 0.0019 k = 0.1668 λ = 0.276/sec.

0.00 0.00 0.00 0.01 0.01 0.01 0.01 0.01 0.02 1 31 61 91 121 151 181 211 241 271

Time between attacks pdf Data Mixture (Pareto, Exp.) Exponential Pa = 0.0144 k = 0.0183 λ = 0.0136/sec.

0.00 0.01 0.02 0.03 0.04 0.05 0.06 1 31 61 91 121 151 181 211 241 271

Time between attacks pdf Data Mixture (Pareto, Exp.) Exponential

Pa = 0.0031 k = 0.1240 λ = 0.275/sec.

Time (sec.)

0.00 0.01 0.01 0.02 0.02 0.03 0.03 1 31 61 91 121 151 181 211 241 271 Time between attacks pdf Pa = 0.0051 k = 0.173 λ = 0.121/sec. Data Mixture (Pareto, Exp.) Exponential

slide-15
SLIDE 15

Propagation of attacks

 A Propagation is assumed to occur when an IP address of an attacking machine observed at a given platform is observed at another platform  Propagation graph

■ Nodes identify the platforms ■ Transitions identify propagations

  • A propagation between Pi and Pj occurs from an IP

address when the next occurrence of this address is

  • bserved on Pj after visiting Pi

■ Probabilities are associated to the transitions to reflect their likelihood of occurrence

slide-16
SLIDE 16

Propagation graph

 Issues under investigation

■ Focus on specific attacks (largest clusters, worms, etc.) ■ Timing characteristics and probability distributions

P20 P6 P9 P5 P23

96.1% 0.9% 15.1% 43.2% 0.6% 2.7% 29% 4.1% 1.35% 8.1% 12.6% 54.1% 1.4% 1.37% 15.4% 95.5% 1% 0.6% 1.1% 11.3% 59% 3.7% 30.3% 4.3% 96.1% 95.5% 29% 59% 15.1% 15.4% 11.3%

slide-17
SLIDE 17

Summary and Conclusions

 Preliminary models to characterize attack processes

  • bserved on low-interaction honeypots

 Several open issues

■ Predictive models that can be used to support decision making during design and operation stages ■ How to assess the impact of attacks on the security of target systems?

 High-interaction honeypots

■ Analyze attackers behavior once they get access to a target ■ Validate a theoretical model for quantitative evaluation of security developed by LAAS in the 90’s

  • Privilege graph to describe vulnerabilities and attack scenarios
  • METF “Mean Effort To security Failure” to quantify security
  • Assumptions about intruders behaviors