 
              RAIN: Refinable Attack Investigation with On-demand Inter-process Information Flow Tracking Yang Ji, Sangho Lee, Evan Downing, Weiren Wang, Mattia Fazzini, Taesoo Kim, Alessandro Orso, and Wenke Lee ACM CCS 2017 Oct 31, 2017
More and more data breaches DATA BREACHES (SOURCE: BREACH LEVEL INDEX BY GEMALTO) Number of data breaches Number of breached records (mil) 3000 2500 2459 2000 1901 1594 1500 1155 1029 1000 924 918 853 819 815 721 658 665 558 513 500 428 427 316 0 2013-H1 2013-H2 2014-H1 2014-H2 2015-H1 2015-H2 2016-H1 2016-H2 2017-H1 2
Is attack investigation accurate? Dependency confusion! A, B, or C ? read A read send “Hmm, I only want C !” send B send read send C 3
File archive ? Dependency confusion! Is this file affected ? write ? write recv write “Let me change read write the offer price.” ? 4
Related work • System-call-based Accuracy • DTrace, Protracer, LSM, Hi-Fi • Dynamic Information Flow Tracking (DIFT) • Panorama, Dtracker • DIFT + Record replay Runtime Analysis • Arnold Efficiency Efficiency 4
RAIN • We use Accuracy • Record replay • Graph-based pruning • Selective DIFT • We achieve • High accuracy • Runtime efficiency Analysis Runtime • Highly improved analysis efficiency Efficiency Efficiency RAIN 5
Threat model • Trusts the OS • RAIN tracks user-level attacks. • Tracks explicit channels • Side or covert channel is out of scope. • Records all attacks from their inception • Hardware trojans or OS backdoor is out of scope. 7
Architecture Analysis host Provenance Target host graph builder Triggering, Customized reachability Logs libc analysis RAIN Replay and Kernel Module selective DIFT Refined sub-graph Coarse-level graph Pruned sub-graph Refine Prune 8
OS-level record replay Process group 1.Records external inputs Thread 2 Thread 1 2.Captures the thread Socket External IPC switching from the inputs pthread interface, not the Internal data produced internal data File 3.Records system-wide executions Thread switching Randomness (via Pthread) 9
Coarse-level logging and graph building • Keeps logging system-call events • Constructs a graph to represent: A: Attacker site • the processes, files, and sockets as nodes B: /docs/report.doc • the events as causality edges C: /tmp/errors.zip B Read P1: /usr/bin/firefox P1 Send Read C A 10
Pruning • Does every recorded execution need replay and DIFT? No! • Prunes the data in the graph based on trigger analysis results • Upstream • Downstream • Point-to-point • Interference 11
A: Attacker site Upstream B: /docs/report.doc C: /tmp/errors.zip D Read D: /docs/ctct1.csv E: /docs/ctct2.pdf F: /docs/loss.csv P2 P1: /usr/bin/firefox E Write P2: /usr/bin/TextEditor Read B Read P3: /bin/gzip Write P3 P1 Send Read Mmap F C A 12
Downstream A: Tampered file /docs/ctct.csv B: Seasonal report docs/s1.csv C: Seasonal report docs/s2.csv D: Budget report docs/bgt.csv B E: Half-year report docs/h2.pdf Write P1 P1: Spreadsheet editor Read P2: Auto-budget program Write A P3: Auto-report program Read C Read Write P3 P2 E Read Write D 13
A: Tampered file /docs/ctct.csv Point-to-point B: Seasonal report docs/s1.csv C: Seasonal report docs/s2.csv D: Budget report docs/bgt.csv E: Half-year report docs/h2.pdf F: Document archive server Read Write B Send P3 P1: Spreadsheet editor 1 F P1 P2: Auto-budget program P3: Auto-report program Read Write P4: Firefox browser Read A C Read P4 2 Write Read Write P2 E D 14
Interference • Insight: only inbound and outbound files that interfere in a process will possibly produce causality. • We determine interference according to the time order of inbound and outbound IO events. P3 P2 E C B F t 1 D t 1 Write Read Mmap Read t 2 t 2 t 3 Write t 1 <t 2 t 1 < t 2 < t 3 15
Refinement - selective DIFT • Replays and conducts DIFT to the necessary part of the execution • Aggregation • Upstream • Downstream • Point-to-point 16
Upstream refinement Read D P2 E Write Read B Read Write P3 Send P1 Read Mmap F C A 17
Implementation summary • RAIN is built on top of: • Arnold, the record replay framework • Dtracker (Libdft) and Dytan, the taint engines Host Module LoC Kernel module 2,200 C (Diff) Target host Trace logistics 1,100 C Provenance graph 6,800 C++ Trigger/Pruning 1,100 Python Analysis host Selective refinement 900 Python DIFT Pin tools 3,500 C/C++ (Diff) 18
Evaluations • Runtime performance • Accuracy • Analysis cost • Storage footprint 19
Runtime overhead: 3.22% SPEC CPU2006 160.00% Logging Logging+Recording 140.00% 120.00% 100.00% 80.00% 60.00% 40.00% 20.00% 0.00% bzip2 perlbench calculix gamess bwaves sjeng omnetpp mcf astar h264ref hmmer xalancbmk gobmk libquantum sphinx3 milc zeusmp gromacs leslie3d namd lbm dealII soplex povray GemsFDTD tonto wrf gcc 20
Multi-thread runtime overhead: 5.35% SPLASH-3 140.00% 120.00% 100.00% 80.00% 60.00% 40.00% 20.00% 0.00% ocean-c ocean-n fmm radiosity water-n water-s barnes volrend 21
IO intensive application: less than 50% Logging Logging+Recording 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 kernel movie libc Firefox copy download compilation session 22
High analysis accuracy Dependency confusion rate 90.50% 100.00% 84.70% 80.00% 67.00% 60.00% 39.70% 32% 40.00% 13% 20.00% 0% 0% 0% 0% 0.00% Screengrab Cameragrab Audiograb NetRecon Motivating Example Coarse-level Fine-level Scenarios from red team exercise of DARPA Transparent Computing program 23
Pruning effectiveness: ~94.2% reduction Taint workload: #processes 720 800 700 600 500 310 400 300 141 138 99 200 34 13 19 5 11 100 0 Screengrab Cameragrab Audiograb NetRecon Motive Example None RAIN 24
Storage cost: ~4GB per day (1.5TB per year) Storage overhead (MB) Screengrab 113.9 Cameragrab 105 Audiograb 133.6 NetRecon 166.1 Motive Example 200.6 Libc compilation 740 Per day desktop 4000 0 500 1000 1500 2000 2500 3000 3500 4000 4500 25
Discussion • Limitations • RAIN trusts the OS that needs kernel integrity protection. • Over-tainting issue • Direction • Hypervisor-based RAIN • Further reduce storage overhead 26
Conclusion • RAIN adopts a multi-level provenance system to facilitate fine- grained analysis that enables accurate attack investigation. • RAIN has low runtime overhead, as well as significantly improved analysis cost. 27
Recommend
More recommend