On-demand Inter-process Information Flow Tracking Yang Ji, Sangho - - PowerPoint PPT Presentation
On-demand Inter-process Information Flow Tracking Yang Ji, Sangho - - PowerPoint PPT Presentation
RAIN: Refinable Attack Investigation with On-demand Inter-process Information Flow Tracking Yang Ji, Sangho Lee, Evan Downing, Weiren Wang, Mattia Fazzini, Taesoo Kim, Alessandro Orso, and Wenke Lee ACM CCS 2017 Oct 31, 2017 More and more
More and more data breaches
2
More and more data breaches
2
658 558 819 924 1029 853 1155 815 918 513 1594 428 2459 316 427 665 721 1901 500 1000 1500 2000 2500 3000 2013-H12013-H22014-H12014-H22015-H12015-H22016-H12016-H22017-H1
DATA BREACHES (SOURCE: BREACH LEVEL INDEX BY GEMALTO)
Number of data breaches Number of breached records (mil)
Is attack investigation accurate?
3
Is attack investigation accurate?
3
A B C
Is attack investigation accurate?
3
read A B C
Is attack investigation accurate?
3
read read A B C
Is attack investigation accurate?
3
read read read A B C
Is attack investigation accurate?
3
read read read A B C
“Hmm, I only want C!”
Is attack investigation accurate?
3
send send send send read read read A B C
“Hmm, I only want C!”
Is attack investigation accurate?
3
send send send send read read read A B C
“Hmm, I only want C!”
A, B, or C ?
Is attack investigation accurate?
3
send send send send read read read A B C
“Hmm, I only want C!”
A, B, or C ?
Dependency confusion!
4
File archive
4
recv write File archive
4
“Let me change the offer price.”
recv write File archive
4
“Let me change the offer price.”
recv write File archive
4
“Let me change the offer price.”
recv write File archive write write write read
4
“Let me change the offer price.”
recv write File archive write write write read ? ? ?
Is this file affected ?
4
“Let me change the offer price.”
recv write File archive write write write read ? ? ?
Is this file affected ?
Dependency confusion!
Related work
Accuracy Runtime Efficiency Analysis Efficiency
4
Related work
Accuracy Runtime Efficiency Analysis Efficiency
- System-call-based
- DTrace, Protracer, LSM, Hi-Fi
4
Related work
Accuracy Runtime Efficiency Analysis Efficiency
- System-call-based
- DTrace, Protracer, LSM, Hi-Fi
4
Related work
Accuracy Runtime Efficiency Analysis Efficiency
- System-call-based
- DTrace, Protracer, LSM, Hi-Fi
- Dynamic Information Flow Tracking (DIFT)
- Panorama, Dtracker
4
Related work
Accuracy Runtime Efficiency Analysis Efficiency
- System-call-based
- DTrace, Protracer, LSM, Hi-Fi
- Dynamic Information Flow Tracking (DIFT)
- Panorama, Dtracker
4
Related work
Accuracy Runtime Efficiency Analysis Efficiency
- System-call-based
- DTrace, Protracer, LSM, Hi-Fi
- Dynamic Information Flow Tracking (DIFT)
- Panorama, Dtracker
- DIFT + Record replay
- Arnold
4
Related work
Accuracy Runtime Efficiency Analysis Efficiency
- System-call-based
- DTrace, Protracer, LSM, Hi-Fi
- Dynamic Information Flow Tracking (DIFT)
- Panorama, Dtracker
- DIFT + Record replay
- Arnold
4
RAIN
Accuracy Runtime Efficiency Analysis Efficiency
5
RAIN
Accuracy Runtime Efficiency Analysis Efficiency
- We use
- Record replay
- Graph-based pruning
- Selective DIFT
5
RAIN
Accuracy Runtime Efficiency Analysis Efficiency
- We use
- Record replay
- Graph-based pruning
- Selective DIFT
- We achieve
- High accuracy
- Runtime efficiency
- Highly improved analysis efficiency
RAIN
5
Threat model
- Trusts the OS
- RAIN tracks user-level attacks.
- Tracks explicit channels
- Side or covert channel is out of scope.
- Records all attacks from their inception
- Hardware trojans or OS backdoor is out of scope.
8
9
Analysis host Target host
Architecture
9
Analysis host RAIN Customized Kernel Target host
Architecture
9
Analysis host RAIN Customized Kernel Customized libc Target host
Architecture
9
Analysis host RAIN Customized Kernel Customized libc Target host Logs
Architecture
9
Analysis host Provenance graph builder RAIN Customized Kernel Customized libc Target host Logs Coarse-level graph
Architecture
9
Analysis host Provenance graph builder Triggering, reachability analysis RAIN Customized Kernel Customized libc Target host Logs Coarse-level graph Pruned sub-graph Prune
Architecture
9
Analysis host Provenance graph builder Triggering, reachability analysis Replay and selective DIFT RAIN Customized Kernel Customized libc Target host Logs Coarse-level graph Pruned sub-graph Refined sub-graph Prune Refine
Architecture
OS-level record replay
1.Records external inputs 2.Captures the thread switching from the pthread interface, not the produced internal data 3.Records system-wide executions
10
OS-level record replay
1.Records external inputs 2.Captures the thread switching from the pthread interface, not the produced internal data 3.Records system-wide executions
Thread 1
10
OS-level record replay
Socket
External inputs
1.Records external inputs 2.Captures the thread switching from the pthread interface, not the produced internal data 3.Records system-wide executions
Thread 1
10
OS-level record replay
File Socket
External inputs
1.Records external inputs 2.Captures the thread switching from the pthread interface, not the produced internal data 3.Records system-wide executions
Thread 1
10
OS-level record replay
File Socket Randomness
External inputs
1.Records external inputs 2.Captures the thread switching from the pthread interface, not the produced internal data 3.Records system-wide executions
Thread 1
10
OS-level record replay
File Socket Randomness
External inputs
1.Records external inputs 2.Captures the thread switching from the pthread interface, not the produced internal data 3.Records system-wide executions
Thread 1 Process group
10
OS-level record replay
File Socket Randomness
External inputs
1.Records external inputs 2.Captures the thread switching from the pthread interface, not the produced internal data 3.Records system-wide executions
Thread 1 Thread 2 Process group
10
OS-level record replay
File Socket Randomness
External inputs
1.Records external inputs 2.Captures the thread switching from the pthread interface, not the produced internal data 3.Records system-wide executions
IPC Thread 1 Thread 2 Process group
Internal data
Thread switching (via Pthread)
10
OS-level record replay
File Socket Randomness
External inputs
1.Records external inputs 2.Captures the thread switching from the pthread interface, not the produced internal data 3.Records system-wide executions
IPC Thread 1 Thread 2 Process group
Internal data
Thread switching (via Pthread)
10
Coarse-level logging and graph building
- Keeps logging system-call events
- Constructs a graph to represent:
- the processes, files, and sockets as nodes
- the events as causality edges
11
A: Attacker site B: /docs/report.doc C: /tmp/errors.zip P1: /usr/bin/firefox
Coarse-level logging and graph building
- Keeps logging system-call events
- Constructs a graph to represent:
- the processes, files, and sockets as nodes
- the events as causality edges
11
P1
A: Attacker site B: /docs/report.doc C: /tmp/errors.zip P1: /usr/bin/firefox
Coarse-level logging and graph building
- Keeps logging system-call events
- Constructs a graph to represent:
- the processes, files, and sockets as nodes
- the events as causality edges
11
B
P1
Read
A: Attacker site B: /docs/report.doc C: /tmp/errors.zip P1: /usr/bin/firefox
Coarse-level logging and graph building
- Keeps logging system-call events
- Constructs a graph to represent:
- the processes, files, and sockets as nodes
- the events as causality edges
11
C B
P1
Read Read
A: Attacker site B: /docs/report.doc C: /tmp/errors.zip P1: /usr/bin/firefox
Coarse-level logging and graph building
- Keeps logging system-call events
- Constructs a graph to represent:
- the processes, files, and sockets as nodes
- the events as causality edges
11
C B
P1
A
Send Read Read
A: Attacker site B: /docs/report.doc C: /tmp/errors.zip P1: /usr/bin/firefox
- Does every recorded execution need replay and DIFT?
12
- Does every recorded execution need replay and DIFT? No!
12
Pruning
- Does every recorded execution need replay and DIFT?
- Prunes the data in the graph based on trigger analysis results
- Upstream
- Downstream
- Point-to-point
- Interference
No!
12
A: Attacker site B: /docs/report.doc C: /tmp/errors.zip D: /docs/ctct1.csv E: /docs/ctct2.pdf F: /docs/loss.csv P1: /usr/bin/firefox P2: /usr/bin/TextEditor P3: /bin/gzip
Upstream
13
A
A: Attacker site B: /docs/report.doc C: /tmp/errors.zip D: /docs/ctct1.csv E: /docs/ctct2.pdf F: /docs/loss.csv P1: /usr/bin/firefox P2: /usr/bin/TextEditor P3: /bin/gzip
Upstream
13
F
P3
C E
P2
B
P1
A D
Send Read Read Write Write Read Read Mmap
A: Attacker site B: /docs/report.doc C: /tmp/errors.zip D: /docs/ctct1.csv E: /docs/ctct2.pdf F: /docs/loss.csv P1: /usr/bin/firefox P2: /usr/bin/TextEditor P3: /bin/gzip
Upstream
13
A: Tampered file /docs/ctct.csv B: Seasonal report docs/s1.csv C: Seasonal report docs/s2.csv D: Budget report docs/bgt.csv E: Half-year report docs/h2.pdf P1: Spreadsheet editor P2: Auto-budget program P3: Auto-report program
Downstream
14
A
A: Tampered file /docs/ctct.csv B: Seasonal report docs/s1.csv C: Seasonal report docs/s2.csv D: Budget report docs/bgt.csv E: Half-year report docs/h2.pdf P1: Spreadsheet editor P2: Auto-budget program P3: Auto-report program
Downstream
14
P3
C E
P2
B
P1
A D
Read Write Write Read Write Read Write Read
A: Tampered file /docs/ctct.csv B: Seasonal report docs/s1.csv C: Seasonal report docs/s2.csv D: Budget report docs/bgt.csv E: Half-year report docs/h2.pdf P1: Spreadsheet editor P2: Auto-budget program P3: Auto-report program
Downstream
14
Point-to-point
E A
A: Tampered file /docs/ctct.csv B: Seasonal report docs/s1.csv C: Seasonal report docs/s2.csv D: Budget report docs/bgt.csv E: Half-year report docs/h2.pdf F: Document archive server P1: Spreadsheet editor P2: Auto-budget program P3: Auto-report program P4: Firefox browser
15
Point-to-point
F
P3
C E
P2
B A D
P1 P4
Read Read Read Read Read Write Write Write Write Send
A: Tampered file /docs/ctct.csv B: Seasonal report docs/s1.csv C: Seasonal report docs/s2.csv D: Budget report docs/bgt.csv E: Half-year report docs/h2.pdf F: Document archive server P1: Spreadsheet editor P2: Auto-budget program P3: Auto-report program P4: Firefox browser
15
Point-to-point
F
P3
C E
P2
B A D
P1 P4
Read Read Read Read Read Write Write Write Write Send
1 2
A: Tampered file /docs/ctct.csv B: Seasonal report docs/s1.csv C: Seasonal report docs/s2.csv D: Budget report docs/bgt.csv E: Half-year report docs/h2.pdf F: Document archive server P1: Spreadsheet editor P2: Auto-budget program P3: Auto-report program P4: Firefox browser
15
Point-to-point
F
P3
C E
P2
B A D
P1 P4
Read Read Read Read Read Write Write Write Write Send
1 2
A: Tampered file /docs/ctct.csv B: Seasonal report docs/s1.csv C: Seasonal report docs/s2.csv D: Budget report docs/bgt.csv E: Half-year report docs/h2.pdf F: Document archive server P1: Spreadsheet editor P2: Auto-budget program P3: Auto-report program P4: Firefox browser
15
Interference
- Insight: only inbound and outbound files that interfere in a
process will possibly produce causality.
- We determine interference according to the time order of inbound and
- utbound IO events.
16
Interference
- Insight: only inbound and outbound files that interfere in a
process will possibly produce causality.
- We determine interference according to the time order of inbound and
- utbound IO events.
D B P2
16
Interference
- Insight: only inbound and outbound files that interfere in a
process will possibly produce causality.
- We determine interference according to the time order of inbound and
- utbound IO events.
D B P2 Write t1
16
Interference
- Insight: only inbound and outbound files that interfere in a
process will possibly produce causality.
- We determine interference according to the time order of inbound and
- utbound IO events.
D B P2 Write Read t1 t2 t1<t2
16
Interference
- Insight: only inbound and outbound files that interfere in a
process will possibly produce causality.
- We determine interference according to the time order of inbound and
- utbound IO events.
D B P2 Write Read t1 t2 t1<t2 C F P3 E
16
Interference
- Insight: only inbound and outbound files that interfere in a
process will possibly produce causality.
- We determine interference according to the time order of inbound and
- utbound IO events.
D B P2 Write Read t1 t2 t1<t2 C F P3 t1 E Read
16
Interference
- Insight: only inbound and outbound files that interfere in a
process will possibly produce causality.
- We determine interference according to the time order of inbound and
- utbound IO events.
D B P2 Write Read t1 t2 t1<t2 C F P3 t1 t2 E Read Mmap
16
Interference
- Insight: only inbound and outbound files that interfere in a
process will possibly produce causality.
- We determine interference according to the time order of inbound and
- utbound IO events.
D B P2 Write Read t1 t2 t1<t2 C F P3 t1 t2 t3 E Read Mmap Write t1< t2< t3
16
Interference
- Insight: only inbound and outbound files that interfere in a
process will possibly produce causality.
- We determine interference according to the time order of inbound and
- utbound IO events.
D B P2 Write Read t1 t2 t1<t2 C F P3 t1 t2 t3 E Read Mmap Write t1< t2< t3
16
Refinement - selective DIFT
- Replays and conducts DIFT to the necessary part of the
execution
- Aggregation
- Upstream
- Downstream
- Point-to-point
17
F
P3
C E
P2
B
P1
A D
Send Read Read Write Write Read Read Mmap
Upstream refinement
18
F
P3
C E
P2
B
P1
A D
Send Read Read Write Write Read Read Mmap
Upstream refinement
18
F
P3
C E
P2
B
P1
A D
Send Read Read Write Write Read Read Mmap
Upstream refinement
18
F
P3
C E
P2
B
P1
A D
Send Read Read Write Write Read Read Mmap
Upstream refinement
18
Implementation summary
- RAIN is built on top of:
- Arnold, the record replay framework
- Dtracker (Libdft) and Dytan, the taint engines
Host Module LoC Target host Kernel module 2,200 C (Diff) Trace logistics 1,100 C Analysis host Provenance graph 6,800 C++ Trigger/Pruning 1,100 Python Selective refinement 900 Python DIFT Pin tools 3,500 C/C++ (Diff)
19
Evaluations
- Runtime performance
- Accuracy
- Analysis cost
- Storage footprint
20
21
Runtime overhead: 3.22% SPEC CPU2006
0.00% 20.00% 40.00% 60.00% 80.00% 100.00% 120.00% 140.00% 160.00%
bzip2 perlbench calculix gamess bwaves sjeng
- mnetpp
mcf astar h264ref hmmer xalancbmk gobmk libquantum sphinx3 milc zeusmp gromacs leslie3d namd lbm dealII soplex povray GemsFDTD tonto wrf gcc
Logging Logging+Recording
Multi-thread runtime overhead: 5.35% SPLASH-3
22
0.00% 20.00% 40.00% 60.00% 80.00% 100.00% 120.00% 140.00%
- cean-c ocean-n
fmm radiosity water-n water-s barnes volrend
IO intensive application: less than 50%
23
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
kernel copy movie download libc compilation Firefox session Logging Logging+Recording
24
0.00% 20.00% 40.00% 60.00% 80.00% 100.00% Screengrab Cameragrab Audiograb NetRecon Motivating Example
90.50% 32% 39.70% 84.70% 67.00% 0% 0% 0% 13% 0%
Dependency confusion rate
Coarse-level Fine-level
High analysis accuracy
Scenarios from red team exercise of DARPA Transparent Computing program
Pruning effectiveness: ~94.2% reduction
25
100 200 300 400 500 600 700 800
Screengrab Cameragrab Audiograb NetRecon Motive Example
99 141 310 138 720 5 19 11 13 34
Taint workload: #processes
None RAIN
Storage cost: ~4GB per day (1.5TB per year)
26
4000 740 200.6 166.1 133.6 105 113.9
500 1000 1500 2000 2500 3000 3500 4000 4500
Per day desktop Libc compilation Motive Example NetRecon Audiograb Cameragrab Screengrab
Storage overhead (MB)
Discussion
- Limitations
- RAIN trusts the OS that needs kernel integrity protection.
- Over-tainting issue
- Direction
- Hypervisor-based RAIN
- Further reduce storage overhead
27
Conclusion
- RAIN adopts a multi-level provenance system to facilitate fine-
grained analysis that enables accurate attack investigation.
- RAIN has low runtime overhead, as well as significantly
improved analysis cost.
28