AIQL : Enabling Efficient Attack Investigation from System - - PowerPoint PPT Presentation

aiql enabling efficient attack investigation from system
SMART_READER_LITE
LIVE PREVIEW

AIQL : Enabling Efficient Attack Investigation from System - - PowerPoint PPT Presentation

AI AIQL : Enabling Efficient Attack Investigation from System Monitoring Data Peng Gao 1 , Xusheng Xiao 2 , Zhichun Li 3 , Kangkook Jee 3 , Fengyuan Xu 4 , Sanjeev R. Kulkarni 1 , Prateek Mittal 1 1 Princeton University 2 Case Western Reserve


slide-1
SLIDE 1

AI AIQL: Enabling Efficient Attack Investigation from System Monitoring Data

Peng Gao1, Xusheng Xiao2, Zhichun Li3, Kangkook Jee3, Fengyuan Xu4, Sanjeev R. Kulkarni1, Prateek Mittal1

1Princeton University 2Case Western Reserve University 3NEC Laboratories America, Inc. 4Nanjing University

slide-2
SLIDE 2

Introduction

slide-3
SLIDE 3

The Equifax Data Breach

slide-4
SLIDE 4

Impact of Advanced Persistent Threat (APT) Attack

  • Advanced: sophisticated techniques, e.g.,

exploiting multiple vulnerabilities

  • Persistent: adversaries are continuously

monitoring and stealing data from the target

  • Threat: strong economical or political motives
slide-5
SLIDE 5

Advanced Persistent Threat (APT)

Compromise Infection Privilege Escalation Obtain database Credential Data Exfiltration

Gateway Internal Machine Domain Controller Database

slide-6
SLIDE 6

Cha Challeng nges and Op Opportunities

Sophisticated and stealthy

  • Multiple steps
  • Individual step not

suspicious enough

  • Stealthy for a certain time

Compromise Infection Privilege Escalation Obtain database Credential Data Exfiltration

slide-7
SLIDE 7

Ub Ubiquitous Sys System Mo Monitoring

  • Recording system behaviors from kernel

ØUnified structure of logs: not bound to applications

  • System activities w.r.t. system resources

ØSystem resources: processes, files, network connections ØSystem activities (system events): <subject, operation, object>,

§ e.g., proc p1 read file f1

  • Enabling attack investigation via querying the collected data

ØInteractive query: recovering attack steps in multiple hosts

Kernel

slide-8
SLIDE 8
  • Multi-step attack: multiple system activities and their relationships
  • Dependency tracking: chaining constraints among system activities
  • Abnormal system behavior: sliding windows and statistical aggregation methods

Challenge 1: At Attack Be Beha havior Sp Specification

slide-9
SLIDE 9
  • Collect and store system monitoring data for hosts in an organization (~50 GB for

100 hosts per day)

– Data storage optimization

  • Query data in a central server for attack investigation

– Query execution optimization

Challenge 2: Ti Timely “B “Big Da Data” Se Security An Analysis

slide-10
SLIDE 10

AI AIQL Syste tem

  • Novel query system for attack investigation

Ø Built on top of existing mature tools

§ System-level monitoring tools: auditd, ETW, DTrace § Relational databases: PostgreSQL, Greenplum

Ø Implementation: ~50,000 lines of Java code

  • System Components <- System Optimizations

ØData Collection and Storage <- Data Model and Storage ØAIQL Language Parser <- Domain-Specific Query Language ØQuery Execution Engine <- Relationship-based Scheduling

slide-11
SLIDE 11

Data Collection and Storage

slide-12
SLIDE 12

Data Collection

  • Data collection agent: system calls as a sequence of system events

ØWindows: Event Tracing for Windows (ETW) ØLinux: Audit Framework (auditd) ØMac: DTrace

  • Collect critical attributes for security analysis
slide-13
SLIDE 13

Data Storage

  • Store data in relational databases powered by PostgreSQL

ØChallenges in handling high ingestion rate

  • Data storage optimizations

ØData deduplication and in-memory indexes

§ Entity tables: file, process, network connection § Event tables: file event, process event, network event

ØBatch commit ØTime and space partitioning ØHypertable

slide-14
SLIDE 14

AIQL (Attack Investigation Query Language)

slide-15
SLIDE 15

Multievent AIQL Query

  • Multiple system events involved in the attack
  • Temporal order of events: e1->e2->e3->e4
slide-16
SLIDE 16

Multievent AIQL Query

  • Global constraints: e.g., agent ID, time window
  • Event pattern: <subject, operation, object>
  • Temporal relationship: e.g., enforce the event order
  • Attribute relationship: e.g., two events linked by the same entity
  • Syntax shortcuts: e.g., context-aware attribute inference

p1.exe_name, p2.exe_name, p3.exe_name, f1.name, p4.exe_name, i1.dst_ip

exe_name = “%cmd.exe” name = “%backup1.dmp”

slide-17
SLIDE 17

Dependency AIQL Query

  • Chain constraints among events
  • Dependency tracking for attack causality analysis
slide-18
SLIDE 18

Dependency AIQL Query

  • Dependency tracking direction: forward/backward, ->/<-
  • Cross host dependency tracking
slide-19
SLIDE 19

Anomaly AIQL Query

  • Frequency-based anomaly models
slide-20
SLIDE 20

Anomaly AIQL Query

  • Sliding windows
  • Access to history states
  • Frequency-based anomaly models: e.g., SMA
slide-21
SLIDE 21

AIQL Query Execution Engine

slide-22
SLIDE 22

Execution of Multievent Query

  • Synthesize a SQL data query for every event pattern
  • Schedule the data queries using domain-specific optimizations
slide-23
SLIDE 23

Data Query Scheduler

  • Leverage event relationships for optimizing search strategies

ØPrioritize event search based on estimated pruning power ØPrune search space of related events

  • Leverage domain-specific characteristics of system monitoring data

for parallel search

ØTime window partition

slide-24
SLIDE 24

Time Window Partition

  • Uniform time window partition

Ø# of sub queries depending on I/O capacity (concurrent read)

  • Other potential partition: uniform workload …

Query time window: one day 4 sub queries Sub-query1 Sub-query2 Sub-query3 Sub-query4

slide-25
SLIDE 25

Case Study and Evaluation

slide-26
SLIDE 26

Case Study: APT Investigation

  • c1 Initial Compromise: Attacker sends a crafted e-mail to the victim, which contains an Excel file with a

malicious macro embedded

  • c2 Malware Infection: Victim opens the file and runs the macro, which downloads and executes a malware

to open a backdoor

  • c3 Privilege Escalation: Attacker runs the database cracking tool to obtain database credentials
  • c4 Penetration into Database Server: Attacker drops another malware to open another backdoor
  • c5 Data Exfiltration: Attacker dumps the database content and sends the dump back to his host
slide-27
SLIDE 27

Investigation of Step c5

  • Deploy anomaly detectors on the database server to detect large amount of data

transfer

ØIdentify a suspicious IP “XXX.129”

  • Construct an anomaly AIQL query to find the processes that access “XXX.129”

Ø“sbblv.exe” (p) writes to “XXX.129”

slide-28
SLIDE 28

Investigation of Step c5

  • Find the files accessed by “sbblv.exe”

Ø “sbblv.exe” reads from “backup1.dmp” (f1)

  • Creation processes of “backup1.dmp”

Ø“sqlservr.exe” (p1) writes to “backup1.dmp”

slide-29
SLIDE 29

Investigation of Step c5

  • Further verification: “cmd.exe” starts “osql.exe” (OSQL utility)
  • Complete AIQL query for c5 (data exfiltration)
slide-30
SLIDE 30

End-to-End Efficiency for Case Study

124x slower ~3 min 157x slower

  • 27 queries, touching 119 GB of data (422 million system events)
slide-31
SLIDE 31

Conciseness Evaluation

  • 19 queries on four major types of attack behaviors, touching 738 GB of data (2.1

billion events)

Ø a1-a5: Multi-step attack behaviors Ø d1-d3: Dependency tracking behaviors Ø v1-v5: Real-world malware behaviors (from Virussign) Ø s1-s6: Abnormal system behaviors

Other languages vs. AIQL: 2.4x more constraints, 3.1x more words, and 4.7x more characters

slide-32
SLIDE 32

Scheduling Efficiency in Our Optimized Storage

  • Efficiency in PostgreSQL

Ø40x speedup

  • Efficiency in Greenplum

Ø16x speedup

slide-33
SLIDE 33

Conclusion

slide-34
SLIDE 34

Conclusion

  • AIQL, a system for efficient attack investigation from system monitoring data

ØExpressive and concise domain-specific query language ØEfficient query execution based on domain-specific data characteristics

  • All queries, the investigation details, and a system demo video are available in

project website

Øhttps://sites.google.com/site/aiqlsystem/ Øhttps://youtu.be/5y3Tk6Y6YkQ

Q & A Thank you!