SLIDE 1 DARPA/I2O Transparent Computing Program
THEIA: Tagging and Tracking of Multi-Level Host Events for Transparent Computing and Information Assurance
Mattia Fazzini Georgia Institute of Technology
Nov 3rd, 2017
SLIDE 2 Agenda
- Project overview
- Technical discussion
– THEIA-Panda – THEIA-KI
SLIDE 3
Project Team
SLIDE 4
Project Team
PI Wenke Lee
SLIDE 5
Project Team
PI Wenke Lee Co-PI Simon Chung Co-PI Taesoo Kim Co-PI Alessandro Orso
SLIDE 6
Project Team
PI Wenke Lee Co-PI Simon Chung Co-PI Taesoo Kim Co-PI Alessandro Orso GTRI Trent Brunson
SLIDE 7
Project Team
PI Wenke Lee Co-PI Simon Chung Co-PI Taesoo Kim Co-PI Alessandro Orso Postdoc Sangho Lee GTRI Trent Brunson
SLIDE 8
Project Team
PI Wenke Lee Co-PI Simon Chung Co-PI Taesoo Kim Co-PI Alessandro Orso Postdoc Sangho Lee GTRI Trent Brunson Ph.D Student Evan Downing Ph.D Student Mattia Fazzini Ph.D Student Yang Ji Ph.D Student Weiren Wang Ph.D Student Carter Yagemann Ph.D Student Joey Allen
SLIDE 9
Data Breaches
SLIDE 10
Data Breaches
SLIDE 11
Data Breaches Trend
SLIDE 12 THEIA
– Tagging and tracking of multi-level host events for detection of advanced persistent threats (APTs)
– Decouple analyses from runtime through record and replay
– OS level
- Establish causality relationship between system operations
– Program level
- Identify relations between program instructions
– UI level
- Capture user’s intent to provide ground truth of intended behavior
SLIDE 13 THEIA
– Tagging and tracking of multi-level host events for detection of advanced persistent threats (APTs)
– Decouple analyses from runtime through record and replay
– OS level
- Establish causality relationship between system operations
– Program level
- Identify relations between program instructions
– UI level
- Capture user’s intent to provide ground truth of intended behavior
SLIDE 14 THEIA
– Tagging and tracking of multi-level host events for detection of advanced persistent threats (APTs)
– Decouple analyses from runtime through record and replay
– OS level
- Establish causality relationship between system operations
– Program level
- Identify relations between program instructions
– UI level
- Capture user’s intent to provide ground truth of intended behavior
SLIDE 15 Advanced Persistent Threats (APTs)
– Advanced persistent threats (APTs) take place over a long period of time and can blend in with normal user and program activities
SLIDE 16 Advanced Persistent Threats (APTs)
– Advanced persistent threats (APTs) take place over a long period of time and can blend in with normal user and program activities
SLIDE 17 Advanced Persistent Threats (APTs)
– Advanced persistent threats (APTs) take place over a long period of time and can blend in with normal user and program activities
SLIDE 18 Advanced Persistent Threats (APTs)
– Advanced persistent threats (APTs) take place over a long period of time and can blend in with normal user and program activities
SLIDE 19 Advanced Persistent Threats (APTs)
– Advanced persistent threats (APTs) take place over a long period of time and can blend in with normal user and program activities
SLIDE 20 DARPA Transparent Computing
TA1 THEIA TA1…
…
TA3 TA2 TA2 TA2 Tagging and Tracking Storage Forensics TA1 Adversarial Scenario TA4 Malware TA5
SLIDE 21 DARPA Transparent Computing
TA1 THEIA TA1…
…
TA3 TA2 TA2 TA2 Tagging and Tracking Storage Forensics TA1 Adversarial Scenario TA4 Malware TA5
SLIDE 22 DARPA Transparent Computing
TA1 THEIA TA1…
…
TA3 TA2 TA2 TA2 Tagging and Tracking Storage Forensics TA1 Adversarial Scenario TA4 Malware TA5
SLIDE 23 DARPA Transparent Computing
TA1 THEIA TA1…
…
TA3 TA2 TA2 TA2 Tagging and Tracking Storage Forensics TA1 Adversarial Scenario TA4 Malware TA5
SLIDE 24 DARPA Transparent Computing
TA1 THEIA TA1…
…
TA3 TA2 TA2 TA2 Tagging and Tracking Storage Forensics TA1 Adversarial Scenario TA4 Malware TA5
SLIDE 25 DARPA Transparent Computing
TA1 THEIA TA1…
…
TA3 TA2 TA2 TA2 Tagging and Tracking Storage Forensics TA1 Adversarial Scenario TA4 Malware TA5
SLIDE 26 THEIA-Panda Overview
Host THEIA-Panda Guest FA
Fine-grained Taint Analysis Action History Graph Real-time On-demand Storage Coarse-grained Taint Analysis System Call Information Process Information Record Replay
SLIDE 27 THEIA-Panda Overview
Host THEIA-Panda Guest FA
Fine-grained Taint Analysis Action History Graph Real-time On-demand Storage Coarse-grained Taint Analysis System Call Information Process Information Record Replay
SLIDE 28 THEIA-Panda Overview
Host THEIA-Panda Guest FA
Fine-grained Taint Analysis Action History Graph Real-time On-demand Storage Coarse-grained Taint Analysis System Call Information Process Information Record Replay
SLIDE 29 THEIA-Panda Overview
Host THEIA-Panda Guest FA
Fine-grained Taint Analysis Action History Graph Real-time On-demand Storage Coarse-grained Taint Analysis System Call Information Process Information Record Replay
SLIDE 30 THEIA-Panda Overview
Host THEIA-Panda Guest FA
Fine-grained Taint Analysis Action History Graph Real-time On-demand Storage Coarse-grained Taint Analysis System Call Information Process Information Record Replay
SLIDE 31 THEIA-Panda Overview
Host THEIA-Panda Guest FA
Fine-grained Taint Analysis Action History Graph Real-time On-demand Storage Coarse-grained Taint Analysis System Call Information Process Information Record Replay
SLIDE 32 THEIA-Panda Overview
Host THEIA-Panda Guest FA
Fine-grained Taint Analysis Action History Graph Real-time On-demand Storage Coarse-grained Taint Analysis System Call Information Process Information Record Replay
SLIDE 33 THEIA-Panda Overview
Host THEIA-Panda Guest FA
Fine-grained Taint Analysis Action History Graph Real-time On-demand Storage Coarse-grained Taint Analysis System Call Information Process Information Record Replay
SLIDE 34 Record and Replay
– Take a snapshot of the machine state – Log non-deterministic inputs
- Data entering CPU on port input
- Hardware interrupts and their parameters
- Data written to RAM during direct memory operation from peripheral
- Replay:
– Replay activity (data) starting from snapshot of machine state
– QEMU/PANDA* and 64-bit Linux Guest
*B. Dolan-Gavitt, J. Hodosh, P. Hulin, T. Leek, R. Whelan. Repeatable Reverse Engineering with PANDA. 5th Program Protection and Reverse Engineering Workshop, Los Angeles, California, December 2015
SLIDE 35 Record and Replay
– Take a snapshot of the machine state – Log non-deterministic inputs
- Data entering CPU on port input
- Hardware interrupts and their parameters
- Data written to RAM during direct memory operation from peripheral
- Replay:
– Replay activity (data) starting from snapshot of machine state
– QEMU/PANDA* and 64-bit Linux Guest
*B. Dolan-Gavitt, J. Hodosh, P. Hulin, T. Leek, R. Whelan. Repeatable Reverse Engineering with PANDA. 5th Program Protection and Reverse Engineering Workshop, Los Angeles, California, December 2015
SLIDE 36 Record and Replay
– Take a snapshot of the machine state – Log non-deterministic inputs
- Data entering CPU on port input
- Hardware interrupts and their parameters
- Data written to RAM during direct memory operation from peripheral
- Replay:
– Replay activity (data) starting from snapshot of machine state
– QEMU/PANDA* and 64-bit Linux Guest
*B. Dolan-Gavitt, J. Hodosh, P. Hulin, T. Leek, R. Whelan. Repeatable Reverse Engineering with PANDA. 5th Program Protection and Reverse Engineering Workshop, Los Angeles, California, December 2015
SLIDE 37
Record and Replay Implementation
Example
static ssize_t e1000_receive(VLANClientState *nc, const uint8_t *buf, size_t size) { do { rr_record_handle_packet_call( RR_CALLSITE_E1000_RECEIVE_2, (void *)( buf + desc_offset + vlan_offset), copy_size, NET_TRANSFER_IOB_TO_RAM) } while (desc_offset < total_size); } … … pci_dma_write(&s->dev, le64_to_cpu(desc.buffer_addr), (void *)(buf + desc_offset + vlan_offset), copy_size); … …
SLIDE 38
Record and Replay Implementation
Example
static ssize_t e1000_receive(VLANClientState *nc, const uint8_t *buf, size_t size) { do { rr_record_handle_packet_call( RR_CALLSITE_E1000_RECEIVE_2, (void *)( buf + desc_offset + vlan_offset), copy_size, NET_TRANSFER_IOB_TO_RAM) } while (desc_offset < total_size); } … … pci_dma_write(&s->dev, le64_to_cpu(desc.buffer_addr), (void *)(buf + desc_offset + vlan_offset), copy_size); … …
SLIDE 39
Record and Replay Implementation
Example
static ssize_t e1000_receive(VLANClientState *nc, const uint8_t *buf, size_t size) { do { rr_record_handle_packet_call( RR_CALLSITE_E1000_RECEIVE_2, (void *)( buf + desc_offset + vlan_offset), copy_size, NET_TRANSFER_IOB_TO_RAM) } while (desc_offset < total_size); } … … pci_dma_write(&s->dev, le64_to_cpu(desc.buffer_addr), (void *)(buf + desc_offset + vlan_offset), copy_size); … …
SLIDE 40 OS-level Transparency
– Capture events and dependencies of OS-level events
– Based on VM introspection
– Process operations:
- clone, fork, execve, exit, etc.
– File operations:
- pen, read, write, unlink, etc.
– Network operations:
- socket, connect, recvmsg, etc.
– Memory operations:
- mmap, mprotect, shmget, etc.
SLIDE 41 OS-level Transparency
– Capture events and dependencies of OS-level events
– Based on VM introspection
– Process operations:
- clone, fork, execve, exit, etc.
– File operations:
- pen, read, write, unlink, etc.
– Network operations:
- socket, connect, recvmsg, etc.
– Memory operations:
- mmap, mprotect, shmget, etc.
SLIDE 42 OS-level Transparency
– Capture events and dependencies of OS-level events
– Based on VM introspection
– Process operations:
- clone, fork, execve, exit, etc.
– File operations:
- pen, read, write, unlink, etc.
– Network operations:
- socket, connect, recvmsg, etc.
– Memory operations:
- mmap, mprotect, shmget, etc.
SLIDE 43
OS-level Transparency Implementation Example
#ifdef TARGET_X86_64 void helper_syscall(int next_eip_addend { panda_cb_list *plist; for(plist = panda_cbs[PANDA_CB_BEFORE_SYSCALL]; plist != NULL; plist = panda_cb_list_next(plist)) { plist->entry.before_syscall(env); } … }
SLIDE 44
OS-level Transparency Implementation Example
#ifdef TARGET_X86_64 void helper_syscall(int next_eip_addend { panda_cb_list *plist; for(plist = panda_cbs[PANDA_CB_BEFORE_SYSCALL]; plist != NULL; plist = panda_cb_list_next(plist)) { plist->entry.before_syscall(env); } … }
SLIDE 45
OS-level Transparency Implementation Example
#ifdef TARGET_X86_64 void helper_syscall(int next_eip_addend { panda_cb_list *plist; for(plist = panda_cbs[PANDA_CB_BEFORE_SYSCALL]; plist != NULL; plist = panda_cb_list_next(plist)) { plist->entry.before_syscall(env); } … }
SLIDE 46 Action History Graph (AHG)
– Represent causality across events
– Process->Process (e.g., fork) – Process->File (e.g., write) – File->Process (e.g., read) – Process->Host (e.g., send) – Host->Process (e.g., recv)
SLIDE 47 Action History Graph (AHG)
– Represent causality across events
– Process->Process (e.g., fork) – Process->File (e.g., write) – File->Process (e.g., read) – Process->Host (e.g., send) – Host->Process (e.g., recv)
SLIDE 48 Action History Graph (AHG)
– Represent causality across events
– Process->Process (e.g., fork) – Process->File (e.g., write) – File->Process (e.g., read) – Process->Host (e.g., send) – Host->Process (e.g., recv)
SLIDE 49
Action History Graph Example
SLIDE 50 Coarse-grained Taint Analysis
– Quickly capture the provenance of objects in the AHG
– Runs while building AHG – Processes have a provenance set – Process operations:
- fork, clone: copy provenance of parent to child process
– File and network operations
- read, recv: associate provenance of object to process
- write, send: associate provenance of process to object
SLIDE 51 Coarse-grained Taint Analysis
– Quickly capture the provenance of objects in the AHG
– Runs while building AHG – Processes have a provenance set – Process operations:
- fork, clone: copy provenance of parent to child process
– File and network operations
- read, recv: associate provenance of object to process
- write, send: associate provenance of process to object
SLIDE 52 Coarse-grained Taint Analysis
– Quickly capture the provenance of objects in the AHG
– Runs while building AHG – Processes have a provenance set – Process operations:
- fork, clone: copy provenance of parent to child process
– File and network operations
- read, recv: associate provenance of object to process
- write, send: associate provenance of process to object
SLIDE 53 Fine-grained Taint Analysis
– Accurately capture provenance of objects in the AHG
– Decoupled from program execution – Instruction level propagation – Taint tags at byte level granularity
– Trace-based dynamic taint analysis
SLIDE 54 Fine-grained Taint Analysis
– Accurately capture provenance of objects in the AHG
– Decoupled from program execution – Instruction level propagation – Taint tags at byte level granularity
– Trace-based dynamic taint analysis
SLIDE 55 Fine-grained Taint Analysis
– Accurately capture provenance of objects in the AHG
– Decoupled from program execution – Instruction level propagation – Taint tags at byte level granularity
– Trace-based dynamic taint analysis
SLIDE 56
Fine-grained Taint Analysis Implementation
Guest Basic Block TCG Basic Block LLVM Basic Block
SLIDE 57
Fine-grained Taint Analysis Implementation
Guest Basic Block TCG Basic Block LLVM Basic Block
SLIDE 58
Fine-grained Taint Analysis Implementation
Guest Basic Block TCG Basic Block LLVM Basic Block
SLIDE 59
Fine-grained Taint Analysis Implementation
Guest Basic Block TCG Basic Block LLVM Basic Block
SLIDE 60 Trace-based Taint Analysis
– Improve performance of fine-grained taint analysis
– Within a trace instruction sequences are executed multiple times
– Based on the execution trace of the system/program – Computes taint summaries for sequences of instructions – Re-use taint summaries on the trace and possible across traces
– Sequitur algorithm: recognizes a lexical structure in an execution trace and generates a grammar where terminals are instructions – Analyze grammar and reuse taint results when possible
SLIDE 61 Trace-based Taint Analysis
– Improve performance of fine-grained taint analysis
– Within a trace instruction sequences are executed multiple times
– Based on the execution trace of the system/program – Computes taint summaries for sequences of instructions – Re-use taint summaries on the trace and possible across traces
– Sequitur algorithm: recognizes a lexical structure in an execution trace and generates a grammar where terminals are instructions – Analyze grammar and reuse taint results when possible
SLIDE 62 Trace-based Taint Analysis
– Improve performance of fine-grained taint analysis
– Within a trace instruction sequences are executed multiple times
– Based on the execution trace of the system/program – Computes taint summaries for sequences of instructions – Re-use taint summaries on the trace and possible across traces
– Sequitur algorithm: recognizes a lexical structure in an execution trace and generates a grammar where terminals are instructions – Analyze grammar and reuse taint results when possible
SLIDE 63 Trace-based Taint Analysis Example
9
… mov qword ptr [r12+rax*8], rdx jmp 0x7f8c47a21b13 add rdx, 0x10 mov rax, qword ptr [rdx] test rax, rax jz 0x7f8c47a21b52 cmp rax, 0x21 jbe 0x7f8c47a21b08 lea rcx, ptr [rip+0x21ef29] …
Execution Trace Grammar
mov qword ptr [r12+rax*8], rdx jump 0x7f8c47a21b13
10
jz 0x7f8c47a21b52
476 8
add rdx, 0x10
43
mov rax, qword ptr [rdx]
test rax, rax
11 11
SLIDE 64 Trace-based Taint Analysis Example
9
… mov qword ptr [r12+rax*8], rdx jmp 0x7f8c47a21b13 add rdx, 0x10 mov rax, qword ptr [rdx] test rax, rax jz 0x7f8c47a21b52 cmp rax, 0x21 jbe 0x7f8c47a21b08 lea rcx, ptr [rip+0x21ef29] …
Execution Trace Grammar
mov qword ptr [r12+rax*8], rdx jump 0x7f8c47a21b13
10
jz 0x7f8c47a21b52
476 8
add rdx, 0x10
43
mov rax, qword ptr [rdx]
11 11
SLIDE 65
Fine-grained Taint Analysis
SLIDE 66
Fine-grained Taint Analysis
SLIDE 67
Case Study Overview
SLIDE 68
Case Study Overview
SLIDE 69
Case Study Overview
SLIDE 70
Case Study Overview
SLIDE 71
Case Study Overview
SLIDE 72 Case Study and AHG
bash execute firefox firefox recv from execute 143.215.130.204 sh sh 143.215.130.204 execute wget wget recv from write screen grab execute screen grab recv msg X0 write s.png execute read nc nc write 143.215.130.204
Process Event File Network Tag Causality
SLIDE 73 Case Study and AHG
bash execute firefox firefox recv from execute 143.215.130.204 sh sh 143.215.130.204 execute wget wget recv from write screen grab execute screen grab recv msg X0 write s.png execute read nc nc write 143.215.130.204
Process Event File Network Tag Causality
SLIDE 74 Case Study and AHG Step 1
1) Victim starts Firefox
bash execute firefox firefox Process Event File Network Tag
recv from recv from screen grab screen grab recv msg
SLIDE 75 Case Study and AHG Step 2
2) Victim visits malicious.com (143.215.130.204) that runs shell process
firefox recv from execute 143.215.130.204 sh sh Process Event File Network Tag
recv from recv from screen grab screen grab recv msg
SLIDE 76 Case Study and AHG Step 3
3) Attacker downloads and executes screengrab
sh 143.215.130.204 execute wget wget recv from write screen grab execute screen grab recv msg X0 write s.png Process Event File Network Tag
recv from recv from screen grab screen grab recv msg
SLIDE 77 Case Study and AHG Step 4
4) Screenshot is sent to attacker’s server
sh execute read s.png nc nc write 143.215.130.204 Process Event File Network Tag
recv from recv from screen grab screen grab recv msg
SLIDE 78 Case Study and
Coarse-grained Taint Analysis.
sh 143.215.130.204 execute wget wget recv from write screen grab libssl.so read libc.so read wgetrc read Process Event File Network Tag
recv from recv from screen grab screen grab recv msg
Coarse Taint Set
SLIDE 79 Case Study and
Coarse-grained Taint Analysis.
sh 143.215.130.204 execute wget wget recv from write screen grab libssl.so read CT1 libc.so read wgetrc read Process Event File Network Tag
recv from recv from screen grab screen grab recv msg
Coarse Taint Set
SLIDE 80 Case Study and
Coarse-grained Taint Analysis.
sh 143.215.130.204 execute wget wget recv from write screen grab libssl.so read CT1 libc.so read wgetrc read Process Event File Network Tag
CT1
recv from recv from screen grab screen grab recv msg
Coarse Taint Set
SLIDE 81 Case Study and
Coarse-grained Taint Analysis.
sh 143.215.130.204 execute wget wget recv from write screen grab libssl.so read CT1 libc.so read CT2 wgetrc read Process Event File Network Tag
CT1
recv from recv from screen grab screen grab recv msg
Coarse Taint Set
SLIDE 82 Case Study and
Coarse-grained Taint Analysis.
sh 143.215.130.204 execute wget wget recv from write screen grab libssl.so read CT1 libc.so read CT2 wgetrc read Process Event File Network Tag
CT1 CT2
recv from recv from screen grab screen grab recv msg
Coarse Taint Set
SLIDE 83 Case Study and
Coarse-grained Taint Analysis.
sh 143.215.130.204 execute wget wget recv from write screen grab libssl.so read CT1 libc.so read CT2 wgetrc read CT3 Process Event File Network Tag
CT1 CT2
recv from recv from screen grab screen grab recv msg
Coarse Taint Set
SLIDE 84 Case Study and
Coarse-grained Taint Analysis.
sh 143.215.130.204 execute wget wget recv from write screen grab libssl.so read CT1 libc.so read CT2 wgetrc read CT3 Process Event File Network Tag
CT1 CT2 CT3
recv from recv from screen grab screen grab recv msg
Coarse Taint Set
SLIDE 85 Case Study and
Coarse-grained Taint Analysis.
sh 143.215.130.204 execute wget wget recv from write screen grab libssl.so read CT1 libc.so read CT2 wgetrc read CT3 CT4 Process Event File Network Tag
CT1 CT2 CT3
recv from recv from screen grab screen grab recv msg
Coarse Taint Set
SLIDE 86 Case Study and
Coarse-grained Taint Analysis.
sh 143.215.130.204 execute wget wget recv from write screen grab libssl.so read CT1 libc.so read CT2 wgetrc read CT3 CT4 Process Event File Network Tag
CT1 CT2 CT3 CT4
recv from recv from screen grab screen grab recv msg
Coarse Taint Set
SLIDE 87 Case Study and
Coarse-grained Taint Analysis.
sh 143.215.130.204 execute wget wget recv from write screen grab libssl.so read CT1 libc.so read CT2 wgetrc read CT3 CT4 CT5 Process Event File Network Tag
CT1 CT2 CT3 CT4
recv from recv from screen grab screen grab recv msg
Coarse Taint Set
SLIDE 88 Case Study and
Fine-grained Taint Analysis
sh 143.215.130.204 execute wget wget recv from write screen grab libssl.so read libc.so read wgetrc read Process Event File Network Tag
recv from recv from screen grab screen grab recv msg
SLIDE 89 Case Study and
Fine-grained Taint Analysis
sh 143.215.130.204 execute wget wget recv from write screen grab libssl.so read FT1 libc.so read wgetrc read Process Event File Network Tag
recv from recv from screen grab screen grab recv msg
SLIDE 90 Case Study and
Fine-grained Taint Analysis
sh 143.215.130.204 execute wget wget recv from write screen grab libssl.so read FT1 libc.so read FT2 wgetrc read Process Event File Network Tag
recv from recv from screen grab screen grab recv msg
SLIDE 91 Case Study and
Fine-grained Taint Analysis
sh 143.215.130.204 execute wget wget recv from write screen grab libssl.so read FT1 libc.so read FT2 wgetrc read FT3 Process Event File Network Tag
recv from recv from screen grab screen grab recv msg
SLIDE 92 Case Study and
Fine-grained Taint Analysis
sh 143.215.130.204 execute wget wget recv from write screen grab libssl.so read FT1 libc.so read FT2 wgetrc read FT3 FT4 Process Event File Network Tag
recv from recv from screen grab screen grab recv msg
SLIDE 93 Case Study and
Fine-grained Taint Analysis
sh 143.215.130.204 execute wget wget recv from write screen grab libssl.so read FT1 libc.so read FT2 wgetrc read FT3 FT4 FT5 Process Event File Network Tag
recv from recv from screen grab screen grab recv msg
SLIDE 94 THEIA-Panda Overheads
TIME Bare Exec Time KVM Exec Time QEMU Exec Time Record Exec Time Replay Exec Time Bare Exec Time KVM Exec Time 2.09 x QEMU Exec Time 6.19 x 2.96 x Record Exec Time 7.75 x 3.71 x 1.25 x Replay Exec Time 13.82 x 6.62 x 2.23 x 1.78 x
- Fine grained taint analysis:
– ~40x to ~300x compared to bare execution
– ~86 GB/day non det log data + ~1.3GB/day graph data
SLIDE 95 THEIA-Panda Overheads
TIME Bare Exec Time KVM Exec Time QEMU Exec Time Record Exec Time Replay Exec Time Bare Exec Time KVM Exec Time 2.09 x QEMU Exec Time 6.19 x 2.96 x Record Exec Time 7.75 x 3.71 x 1.25 x Replay Exec Time 13.82 x 6.62 x 2.23 x 1.78 x
- Fine grained taint analysis:
– ~40x to ~300x compared to bare execution
– ~86 GB/day non det log data + ~1.3GB/day graph data
SLIDE 96 THEIA-Panda Overheads
TIME Bare Exec Time KVM Exec Time QEMU Exec Time Record Exec Time Replay Exec Time Bare Exec Time KVM Exec Time 2.09 x QEMU Exec Time 6.19 x 2.96 x Record Exec Time 7.75 x 3.71 x 1.25 x Replay Exec Time 13.82 x 6.62 x 2.23 x 1.78 x
- Fine grained taint analysis:
– ~40x to ~300x compared to bare execution
– ~86 GB/day non det log data + ~1.3GB/day graph data
SLIDE 97 THEIA-Panda Overheads
TIME Bare Exec Time KVM Exec Time QEMU Exec Time Record Exec Time Replay Exec Time Bare Exec Time KVM Exec Time 2.09 x QEMU Exec Time 6.19 x 2.96 x Record Exec Time 7.75 x 3.71 x 1.25 x Replay Exec Time 13.82 x 6.62 x 2.23 x 1.78 x
- Fine grained taint analysis:
– ~40x to ~300x compared to bare execution
– ~86 GB/day non det log data + ~1.3GB/day graph data
SLIDE 98 THEIA-Panda Overheads
TIME Bare Exec Time KVM Exec Time QEMU Exec Time Record Exec Time Replay Exec Time Bare Exec Time KVM Exec Time 2.09 x QEMU Exec Time 6.19 x 2.96 x Record Exec Time 7.75 x 3.71 x 1.25 x Replay Exec Time 13.82 x 6.62 x 2.23 x 1.78 x
- Fine grained taint analysis:
– ~40x to ~300x compared to bare execution
– ~86 GB/day non det log data + ~1.3GB/day graph data
SLIDE 99 THEIA-Panda Observations
SLIDE 100 THEIA-KI Overview
THEIA-KI-Analysis FA
Fine-grained Taint Analysis Action History Graph Real-time On-demand Storage Query Interface
THEIA-KI + OS
Record Replay System Call Information Process Information
SLIDE 101 THEIA-KI Overview
THEIA-KI-Analysis FA
Fine-grained Taint Analysis Action History Graph Real-time On-demand Storage Query Interface
THEIA-KI + OS
Record Replay System Call Information Process Information
SLIDE 102 THEIA-KI Overview
THEIA-KI-Analysis FA
Fine-grained Taint Analysis Action History Graph Real-time On-demand Storage Query Interface
THEIA-KI + OS
Record Replay System Call Information Process Information
SLIDE 103 THEIA-KI Overview
THEIA-KI-Analysis FA
Fine-grained Taint Analysis Action History Graph Real-time On-demand Storage Query Interface
THEIA-KI + OS
Record Replay System Call Information Process Information
SLIDE 104 THEIA-KI Overview
THEIA-KI-Analysis FA
Fine-grained Taint Analysis Action History Graph Real-time On-demand Storage Query Interface
THEIA-KI + OS
Record Replay System Call Information Process Information
SLIDE 105 THEIA-KI Overview
THEIA-KI-Analysis FA
Fine-grained Taint Analysis Action History Graph Real-time On-demand Storage Query Interface
THEIA-KI + OS
Record Replay System Call Information Process Information
SLIDE 106 THEIA-KI Overview
THEIA-KI-Analysis FA
Fine-grained Taint Analysis Action History Graph Real-time On-demand Storage Query Interface
THEIA-KI + OS
Record Replay System Call Information Process Information
SLIDE 107 THEIA-KI
– Record/replay
- Kernel-based instrumentation
– Instruction level replay of the user space
– Coarse-grained causality
- From system instrumentation and logging
– Fine-grained causality
- From dynamic taint tracking
- Threat model:
– Kernel is trusted
SLIDE 108 THEIA-KI
– Record/replay
- Kernel-based instrumentation
– Instruction level replay of the user space
– Coarse-grained causality
- From system instrumentation and logging
– Fine-grained causality
- From dynamic taint tracking
- Threat model:
– Kernel is trusted
SLIDE 109 THEIA-KI
– Record/replay
- Kernel-based instrumentation
– Instruction level replay of the user space
– Coarse-grained causality
- From system instrumentation and logging
– Fine-grained causality
- From dynamic taint tracking
- Threat model:
– Kernel is trusted
SLIDE 110 Record and Replay
– Kernel instrumentation
- Order, return values and memory addresses modified by a system call
- Timing and values of received signals
- Sources of randomness
– Libc instrumentation
- synchronization of pthread
- Implementation:
– Arnold* with 32-bit Linux kernel
Process group
Thread 1 Thread 2
*David Devecsery, Michael Chow, Xianzheng Dou, Peter M Chen, Jason Flinn. Eidetic Systems. Proceedings of the 11th USENIX Symposium on Operating System Design and Implementation (OSDI), October 2014.
SLIDE 111 Record and Replay
– Kernel instrumentation
- Order, return values and memory addresses modified by a system call
- Timing and values of received signals
- Sources of randomness
– Libc instrumentation
- synchronization of pthread
- Implementation:
– Arnold* with 32-bit Linux kernel
Process group
Thread 1 Thread 2
*David Devecsery, Michael Chow, Xianzheng Dou, Peter M Chen, Jason Flinn. Eidetic Systems. Proceedings of the 11th USENIX Symposium on Operating System Design and Implementation (OSDI), October 2014.
SLIDE 112 Record and Replay
– Kernel instrumentation
- Order, return values and memory addresses modified by a system call
- Timing and values of received signals
- Sources of randomness
– Libc instrumentation
- synchronization of pthread
- Implementation:
– Arnold* with 32-bit Linux kernel
File Socket Randomness
External Inputs
Process group
Thread 1 Thread 2
*David Devecsery, Michael Chow, Xianzheng Dou, Peter M Chen, Jason Flinn. Eidetic Systems. Proceedings of the 11th USENIX Symposium on Operating System Design and Implementation (OSDI), October 2014.
SLIDE 113 Record and Replay
– Kernel instrumentation
- Order, return values and memory addresses modified by a system call
- Timing and values of received signals
- Sources of randomness
– Libc instrumentation
- synchronization of pthread
- Implementation:
– Arnold* with 32-bit Linux kernel
File Socket Randomness
External Inputs
Process group
Thread 1 Thread 2
Thread Synchronization
*David Devecsery, Michael Chow, Xianzheng Dou, Peter M Chen, Jason Flinn. Eidetic Systems. Proceedings of the 11th USENIX Symposium on Operating System Design and Implementation (OSDI), October 2014.
SLIDE 114 Record and Replay
– Kernel instrumentation
- Order, return values and memory addresses modified by a system call
- Timing and values of received signals
- Sources of randomness
– Libc instrumentation
- synchronization of pthread
- Implementation:
– Arnold* with 32-bit Linux kernel
File Socket Randomness
External Inputs
Process group
Thread 1 Thread 2
Thread Synchronization
*David Devecsery, Michael Chow, Xianzheng Dou, Peter M Chen, Jason Flinn. Eidetic Systems. Proceedings of the 11th USENIX Symposium on Operating System Design and Implementation (OSDI), October 2014.
SLIDE 115
Kernel Instrumentation
Implementation Example
unsigned long arch_align_stack(unsigned long sp { /* Begin REPLAY */ if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space){ unsigned int rand = get_random_int(); if (current->record_thrd) { record_randomness(rand); } else if (current->replay_thrd){ rand = replay_randomness(); } sp -= rand % 8192; } /* End REPLAY */ return sp & ~0xf; }
SLIDE 116
Kernel Instrumentation
Implementation Example
unsigned long arch_align_stack(unsigned long sp { /* Begin REPLAY */ if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space){ unsigned int rand = get_random_int(); if (current->record_thrd) { record_randomness(rand); } else if (current->replay_thrd){ rand = replay_randomness(); } sp -= rand % 8192; } /* End REPLAY */ return sp & ~0xf; }
SLIDE 117
Kernel Instrumentation
Implementation Example
unsigned long arch_align_stack(unsigned long sp { /* Begin REPLAY */ if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space){ unsigned int rand = get_random_int(); if (current->record_thrd) { record_randomness(rand); } else if (current->replay_thrd){ rand = replay_randomness(); } sp -= rand % 8192; } /* End REPLAY */ return sp & ~0xf; }
SLIDE 118
Query System Workflow
Reachability & Pruning Coarse-grained Subgraph Fine-grained analysis Fine-grained Tags Triggering Points AHG Queries
SLIDE 119
Query System Workflow
Reachability & Pruning Coarse-grained Subgraph Fine-grained analysis Fine-grained Tags Triggering Points AHG Queries
SLIDE 120
Query System Workflow
Reachability & Pruning Coarse-grained Subgraph Fine-grained analysis Fine-grained Tags Triggering Points AHG Queries
SLIDE 121
Query System Workflow
Reachability & Pruning Coarse-grained Subgraph Fine-grained analysis Fine-grained Tags Triggering Points AHG Queries
SLIDE 122
Query System Workflow
Reachability & Pruning Coarse-grained Subgraph Fine-grained analysis Fine-grained Tags Triggering Points AHG Queries
SLIDE 123 Triggering Points and Queries
– Pre-defined policies
- Process writes to /etc/passwd
- Queries:
– From automated forensic analysis systems – Human based analysis
– Backward:
- Where does this object come from?
– Forward:
- What is the impact of this object on the system?
– Point-to-point:
- Are these two objects related?
SLIDE 124 Triggering Points and Queries
– Pre-defined policies
- Process writes to /etc/passwd
- Queries:
– From automated forensic analysis systems – Human based analysis
– Backward:
- Where does this object come from?
– Forward:
- What is the impact of this object on the system?
– Point-to-point:
- Are these two objects related?
SLIDE 125 Triggering Points and Queries
– Pre-defined policies
- Process writes to /etc/passwd
- Queries:
– From automated forensic analysis systems – Human based analysis
– Backward:
- Where does this object come from?
– Forward:
- What is the impact of this object on the system?
– Point-to-point:
- Are these two objects related?
SLIDE 126 Point-to-point Query Example
- 1. Attacker tampers contract file ctct.csv
- 2. Employee creates seasonal report s1.csv using spreadsheet editor
- 3. Auto report program sends seasonal s1.csv report to archive server
- 4. Employee creates seasonal report s2.csv using spreadsheet editor
- 5. Template generator creates template t.doc
- 6. Employee creates half-year report h2.pdf using document editor
SLIDE 127 Point-to-point Query Example
- 1. Attacker tampers contract file ctct.csv
- 2. Employee creates seasonal report s1.csv using spreadsheet editor
- 3. Auto report program sends seasonal s1.csv report to archive server
- 4. Employee creates seasonal report s2.csv using spreadsheet editor
- 5. Template generator creates template t.doc
- 6. Employee creates half-year report h2.pdf using document editor
SLIDE 128 Point-to-point Query Example
- 1. Attacker tampers contract file ctct.csv
- 2. Employee creates seasonal report s1.csv using spreadsheet editor
- 3. Auto report program sends seasonal s1.csv report to archive server
- 4. Employee creates seasonal report s2.csv using spreadsheet editor
- 5. Template generator creates template t.doc
- 6. Employee creates half-year report h2.pdf using document editor
ctct.csv Spreadsheet Editor read write s1.csv Template Generator t.doc write Document Editor read read h2.pdf write s2.csv write Spreadsheet Editor read Auto Report archive server read send
SLIDE 129 Forward Reachability
- 1. Attacker tampers contract file ctct.csv
- 2. Employee creates seasonal report s1.csv using spreadsheet editor
- 3. Auto report program sends seasonal s1.csv report to archive server
- 4. Employee creates seasonal report s2.csv using spreadsheet editor
- 5. Template generator creates template t.doc
- 6. Employee creates half-year report h2.pdf using document editor
ctct.csv Spreadsheet Editor read write s1.csv Template Generator t.doc write Document Editor read read h2.pdf write s2.csv write Spreadsheet Editor read Auto Report archive server read send
SLIDE 130 Backward Reachability
- 1. Attacker tampers contract file ctct.csv
- 2. Employee creates seasonal report s1.csv using spreadsheet editor
- 3. Auto report program sends seasonal s1.csv report to archive server
- 4. Employee creates seasonal report s2.csv using spreadsheet editor
- 5. Template generator creates template t.doc
- 6. Employee creates half-year report h2.pdf using document editor
ctct.csv Spreadsheet Editor read write s1.csv Template Generator t.doc write Document Editor read read h2.pdf write s2.csv write Spreadsheet Editor read Auto Report archive server read send
SLIDE 131 Reachability Result
- 1. Attacker tampers contract file ctct.csv
- 2. Employee creates seasonal report s1.csv using spreadsheet editor
- 3. Auto report program sends seasonal s1.csv report to archive server
- 4. Employee creates seasonal report s2.csv using spreadsheet editor
- 5. Template generator creates template t.doc
- 6. Employee creates half-year report h2.pdf using document editor
ctct.csv Spreadsheet Editor read write s1.csv Template Generator t.doc write Document Editor read read h2.pdf write s2.csv write Spreadsheet Editor read Auto Report archive server read send
SLIDE 132
Runtime Overhead: SPEC CPU2006
3.22%
SLIDE 133
Runtime Overhead: I/O Operations
<50%
SLIDE 134
Pruning Efficiency
~94.2% reduction
None RAIN
SLIDE 135 Information Flow Tracking Accuracy
~94.2% reduction
Coarse-level Fine-level
SLIDE 136
Storage Cost
~4GB per day
SLIDE 137 Future Work
- Hypervisor-based non-emulation R/R
- Differential Taint Analysis
- Running memory sanitizers on replay
- Multi-host support
- Porting from 32-bit to 64-bit
SLIDE 138 Future Work
- Hypervisor-based non-emulation R/R
- Differential Taint Analysis
- Running memory sanitizers on replay
- Multi-host support
- Porting from 32-bit to 64-bit
SLIDE 139 Future Work
- Hypervisor-based non-emulation R/R
- Differential Taint Analysis
- Running memory sanitizers on replay
- Multi-host support
- Porting from 32-bit to 64-bit
SLIDE 140
Conclusion
SLIDE 141
Conclusion
SLIDE 142
Conclusion
SLIDE 143
Conclusion
SLIDE 144
APT Demo
SLIDE 145
APT Demo
SLIDE 146
APT Demo
SLIDE 147
THEIA-Panda Demo
SLIDE 148
THEIA-Panda Demo
SLIDE 149
THEIA-Panda Demo