empirical studies in cybersecurity some challenges
play

Empirical Studies in Cybersecurity: Some Challenges Michel Cukier - PowerPoint PPT Presentation

Empirical Studies in Cybersecurity: Some Challenges Michel Cukier Adding Science to Cybersecurity Empirical studies are needed to add science to cybersecurity Challenges: Security metrics are lacking Security data are not


  1. Empirical Studies in Cybersecurity: Some Challenges Michel Cukier

  2. Adding Science to Cybersecurity • Empirical studies are needed to add science to cybersecurity • Challenges: – Security metrics are lacking – Security data are not publicly available

  3. Availability of Security Data • Few available datasets have issues (e.g., MIT LL 98/99) • NSF helped initiating collaborations but none succeeded (2001) • NSF workshop on the lack of available data (2010) • DHS PREDICT dataset: – Context is missing – More datasets will be added over time

  4. The End?

  5. A Rare Collaboration • Unique relationship with – G. Sneeringer, Director of Security, and his security team at the Office of Information Technology • Access to security related data collected on the UMD network • Development of testbeds for monitoring attackers Enables unique empirical studies

  6. Incident Data • Incidents: – Confirmed compromised computers – More than 12,000 records since June 2001 • Models: – Software reliability growth models, time series, epidemiological models • Questions: – # incidents: relevant metric? – Impact of time (age, duration)?

  7. Intrusion Prevention System (IPS) Data • Intrusion Prevention System (IPS) alerts: – IPSs located at the border and inside UMD network – More than 7 million events since September 2006 • Models: – Identify outliers, define metrics containing some memory • In-house validation

  8. Network Flows • Network flows: – 130,000 IP addresses monitored (two class B networks belonging to UMD) • Tool: – Goal: increase network visibility – Nfsight (available on sourceforge) • In-house validation • Next goal: – An efficient flow-based IDS

  9. Backend Algorithm Request flow: 2009-07-30 09:34:56.321 TCP 10.0.0.1: 2455 → 10.1.2.3: 80 Host 2 Host 1 Reply flow: 2009-07-30 09:34:56.322 TCP 10.1.2.3: 80 → 10.0.0.1: 2455 Algorithm : • Receive a batch of 5 minutes of flows • Pair up unidirectional flows using {src/dst IP/port and protocol} • Run heuristics and calculate probabilities for each end point to host a service • Output end point results and bidirectional flows Client Server Bi-flow: 2009-07-30 09:34:56.321 TCP 10.0.0.1 :2455 → 10.1.2.3: 80 10.0.0.1 10.1.2.3 to tcp/80 hosts tcp/80

  10. Heuristics Heuristic ID Features and Formula Used Output Values Timing: Timestamp of request < [0, …] Heuristic 0 Timestamp of reply Port numbers: Heuristic 1 Src port > Dst port {0, 0.5, 1} Heuristic 2 Src port > 1024 > Dst port {0, 0.5, 1} Heuristic 3 Port in /etc/services {0, 0.5, 1} Fan in/out relationships: [0, …] Heuristic 4 # ports related [0, …] Heuristic 5 # IP related [0, …] Heuristic 6 # tuples related

  11. Front-end

  12. Case Study: Scanning Activity

  13. Case Study: Worm Outbreak

  14. Case Study: Distributed Attacks

  15. Honeypot (HP) Data • Honeypot data: – Malicious activity collected on more than 1,200 HPs (low and high interaction) – Low interaction HPs deployed at UIUC, AT&T, PJM, France and Morocco – High interaction HPs for study of attacks/attackers

  16. Details of Experiment • Easy access to honeypots though entry point: SSH • Multiple honeypots per attacker for an extended period of time: one month • Configure honeypots given to one attacker with increasing network limitations: some ports blocked • Collect data such as network traffic, keystrokes entered and rogue software downloaded

  17. Configuration Details • The network gateway has two network interfaces: – One in front of the Internet, configured with 40 public IP addresses from the University of Maryland – One configured with a private IP address • OpenSSH was modified to reject SSH attempts on its public IP addresses until the 150 th try • Up to 40 honeypots can exist in parallel • Attackers can deploy up to 3 honeypots • Honeypots: – HP1: no network limitation – HP2: main IRC port blocked (port 6667) – HP3: every port blocked except HTTP, HTTPS, FTP, DNS, and SSH

  18. Test-bed Architecture

  19. Attacker Identification • Attacker IP address • Attacker AS number (identifies network on the Internet) • Attacker actions: – Rogue software origin – Way of performing specific actions – Files accessed • Comparison of keystroke profiles

  20. Attacker Skills • Analyst assesses attacker skill • Preferred approach easier to reproduce • Criteria based on: – Is the attacker careful about not being seen? – Does the attacker check the target environment? – How familiar is the attacker with the rogue software? – Is the attacker protecting the compromised target?

  21. Attacker Skills (Cont.) Criterion Assessment Hide Ratio of # sessions where attacker hid Restore deleted files Ratio # sessions where deleted files were restored Check presence Ratio # sessions where presence checked Delete downloaded 0 if downloaded file is not deleted, 1 otherwise file Check system 0 if system has never been checked, 1 otherwise Edit configuration file 0 if configuration file has never been edited, 1 otherwise Change system 0 if system has never been modified, 1 otherwise Change password 0 if password has never been changed, 1 otherwise Create new user 0 if no new user has been created, 1 otherwise Rogue software 0 if less than half of the installed rogue software is adequacy adequate, 1 otherwise

  22. Overall Results • Experiment run from May 17 th , 2010 to November 5 th , 2010 Honeypot # sessions # non-empty sessions All 312 211 (68%) HP1 160 110 (69%) HP2 105 74 (70%) HP3 47 27 (57%)

  23. Who Launched the Attacks? Based on AS Number Based on IP Address Top countries brute force: Top countries compromise: China (34) Romania (75) USA (27) Lebanon (32) Korea (8) USA (24) Italy (7) UK (16)

  24. Analysis as a Function of Attacker Skill All honeypots 95% 95% Percentage of 100% 79% • Results: 77% attackers 80% 59% 56% 49% 60% 46% – 95% check presence 40% 21% 15% or system 20% 0% – 79% delete downloaded 1 2 3 4 5 6 7 8 9 10 Criterion ID file – 77% change the password – 15% create a new user • There might be a link between attackers actions and their skills

  25. Analysis as a Function of Attacker Skill (Cont.) Create new user Hide 60% 50% Percentage of 50% Percentage of 39% 50% 40% attackers attackers 40% 33% 30% 22% 30% 17% 20% 13% 13% 20% 4% 4% 4% 10% 10% 0% 0% 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 Skill level Skill level (a) (b) Average skill level= 7.7 Average skill level= 6.3

  26. Analysis as a Function of Attacker Skill (Cont.) Password change Check presence 35% 27% 30% 30% Percentage of Percentage of 24% 30% 25% 23% attackers attackers 25% 20% 17% 20% 14% 13% 15% 15% 8% 8% 8% 8% 10% 7% 10% 3% 3% 3% 3% 5% 5% 0% 0% 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 Skill level Skill level (d) (c) Average skill level= 6.0 Average skill level= 5.5

  27. Why Was the Attack Launched? Average number of attackers per Honeypot type • HP1 For the 60 deployed honeypots, 9 (15%) 2 HP2 were targeted Average number of 1.44 1.5 1.20 by more than one attacker 1.18 attackers 1 HP3 1 • 7 honeypots were 0.5 targeted by 2 different All attackers, one honeypot 0 Honeypot type by 3 different attackers and 1 honeypot by 5 different attackers • Raises the important issue about how access is shared and why • Even though 77% of the attackers changed the password, 15% did share access with at least 1 other attacker

  28. Challenges • Generalization? – Replication (same method) – Reproduction (different method) – Re-analysis of data • Issues: – Need collaborations for replication – Need to develop a new method for reproduction – Re-analysis might not be possible

  29. The End?

  30. Theories from Social Sciences to Add Science to Cybersecurity • For the last year: – Focus on criminological theories – Collaboration with David Maimon and his research team • Consider various criminological theories • Identify theories that need to be adapted to cybersecurity

  31. New Use of IPS Alerts • Application to Routine Activity Theory (RAT): – Crime is normal and depends on the opportunities available – If a target is not protected enough, and if the reward is worth it, crime will happen • Alerts = Attack attempts (blocked by IPS) • Results: – Number of alerts is linked to daily activity – Origin of attack is linked to user origin

  32. Use of Honeypot Data • Describe attacker/attack: – Network data – Attacker keystrokes • Empirical study: – Effect of warnings – Various HPs configurations (CPU, memory, disk space)

  33. Issues • Mismatch between what criminological theories need and what HPs data contain • Need statistically significant results (e.g., 6 months, over 120 HPs/week deployed, about 2900 HPs, 3700 sessions) • Experiments need to be deployed over a long period of time: attacks/attackers might evolve

  34. Some Good News • Empirical studies are solid scientific work • Developed approaches can be applied at other locations • Results do not need to be identical (e.g., crime varies between cities)

  35. The End!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend