DARPA/I2O Transparent Computing Program THEIA: Tagging and Tracking - - PowerPoint PPT Presentation

darpa i2o transparent computing program
SMART_READER_LITE
LIVE PREVIEW

DARPA/I2O Transparent Computing Program THEIA: Tagging and Tracking - - PowerPoint PPT Presentation

DARPA/I2O Transparent Computing Program THEIA: Tagging and Tracking of Multi-Level Host Events for Transparent Computing and Information Assurance Mattia Fazzini Georgia Institute of Technology Nov 3rd, 2017 Agenda Project overview


slide-1
SLIDE 1

DARPA/I2O Transparent Computing Program

THEIA: Tagging and Tracking of Multi-Level Host Events for Transparent Computing and Information Assurance

Mattia Fazzini Georgia Institute of Technology

Nov 3rd, 2017

slide-2
SLIDE 2

Agenda

  • Project overview
  • Technical discussion

– THEIA-Panda – THEIA-KI

  • Future work
slide-3
SLIDE 3

Project Team

slide-4
SLIDE 4

Project Team

PI Wenke Lee

slide-5
SLIDE 5

Project Team

PI Wenke Lee Co-PI Simon Chung Co-PI Taesoo Kim Co-PI Alessandro Orso

slide-6
SLIDE 6

Project Team

PI Wenke Lee Co-PI Simon Chung Co-PI Taesoo Kim Co-PI Alessandro Orso GTRI Trent Brunson

slide-7
SLIDE 7

Project Team

PI Wenke Lee Co-PI Simon Chung Co-PI Taesoo Kim Co-PI Alessandro Orso Postdoc Sangho Lee GTRI Trent Brunson

slide-8
SLIDE 8

Project Team

PI Wenke Lee Co-PI Simon Chung Co-PI Taesoo Kim Co-PI Alessandro Orso Postdoc Sangho Lee GTRI Trent Brunson Ph.D Student Evan Downing Ph.D Student Mattia Fazzini Ph.D Student Yang Ji Ph.D Student Weiren Wang Ph.D Student Carter Yagemann Ph.D Student Joey Allen

slide-9
SLIDE 9

Data Breaches

slide-10
SLIDE 10

Data Breaches

slide-11
SLIDE 11

Data Breaches Trend

slide-12
SLIDE 12

THEIA

  • Objective:

– Tagging and tracking of multi-level host events for detection of advanced persistent threats (APTs)

  • Efficiency:

– Decouple analyses from runtime through record and replay

  • Transparency:

– OS level

  • Establish causality relationship between system operations

– Program level

  • Identify relations between program instructions

– UI level

  • Capture user’s intent to provide ground truth of intended behavior
slide-13
SLIDE 13

THEIA

  • Objective:

– Tagging and tracking of multi-level host events for detection of advanced persistent threats (APTs)

  • Efficiency:

– Decouple analyses from runtime through record and replay

  • Transparency:

– OS level

  • Establish causality relationship between system operations

– Program level

  • Identify relations between program instructions

– UI level

  • Capture user’s intent to provide ground truth of intended behavior
slide-14
SLIDE 14

THEIA

  • Objective:

– Tagging and tracking of multi-level host events for detection of advanced persistent threats (APTs)

  • Efficiency:

– Decouple analyses from runtime through record and replay

  • Transparency:

– OS level

  • Establish causality relationship between system operations

– Program level

  • Identify relations between program instructions

– UI level

  • Capture user’s intent to provide ground truth of intended behavior
slide-15
SLIDE 15

Advanced Persistent Threats (APTs)

  • Definition:

– Advanced persistent threats (APTs) take place over a long period of time and can blend in with normal user and program activities

slide-16
SLIDE 16

Advanced Persistent Threats (APTs)

  • Definition:

– Advanced persistent threats (APTs) take place over a long period of time and can blend in with normal user and program activities

slide-17
SLIDE 17

Advanced Persistent Threats (APTs)

  • Definition:

– Advanced persistent threats (APTs) take place over a long period of time and can blend in with normal user and program activities

slide-18
SLIDE 18

Advanced Persistent Threats (APTs)

  • Definition:

– Advanced persistent threats (APTs) take place over a long period of time and can blend in with normal user and program activities

slide-19
SLIDE 19

Advanced Persistent Threats (APTs)

  • Definition:

– Advanced persistent threats (APTs) take place over a long period of time and can blend in with normal user and program activities

slide-20
SLIDE 20

DARPA Transparent Computing

TA1 THEIA TA1…

TA3 TA2 TA2 TA2 Tagging and Tracking Storage Forensics TA1 Adversarial Scenario TA4 Malware TA5

slide-21
SLIDE 21

DARPA Transparent Computing

TA1 THEIA TA1…

TA3 TA2 TA2 TA2 Tagging and Tracking Storage Forensics TA1 Adversarial Scenario TA4 Malware TA5

slide-22
SLIDE 22

DARPA Transparent Computing

TA1 THEIA TA1…

TA3 TA2 TA2 TA2 Tagging and Tracking Storage Forensics TA1 Adversarial Scenario TA4 Malware TA5

slide-23
SLIDE 23

DARPA Transparent Computing

TA1 THEIA TA1…

TA3 TA2 TA2 TA2 Tagging and Tracking Storage Forensics TA1 Adversarial Scenario TA4 Malware TA5

slide-24
SLIDE 24

DARPA Transparent Computing

TA1 THEIA TA1…

TA3 TA2 TA2 TA2 Tagging and Tracking Storage Forensics TA1 Adversarial Scenario TA4 Malware TA5

slide-25
SLIDE 25

DARPA Transparent Computing

TA1 THEIA TA1…

TA3 TA2 TA2 TA2 Tagging and Tracking Storage Forensics TA1 Adversarial Scenario TA4 Malware TA5

slide-26
SLIDE 26

THEIA-Panda Overview

Host THEIA-Panda Guest FA

Fine-grained Taint Analysis Action History Graph Real-time On-demand Storage Coarse-grained Taint Analysis System Call Information Process Information Record Replay

slide-27
SLIDE 27

THEIA-Panda Overview

Host THEIA-Panda Guest FA

Fine-grained Taint Analysis Action History Graph Real-time On-demand Storage Coarse-grained Taint Analysis System Call Information Process Information Record Replay

slide-28
SLIDE 28

THEIA-Panda Overview

Host THEIA-Panda Guest FA

Fine-grained Taint Analysis Action History Graph Real-time On-demand Storage Coarse-grained Taint Analysis System Call Information Process Information Record Replay

slide-29
SLIDE 29

THEIA-Panda Overview

Host THEIA-Panda Guest FA

Fine-grained Taint Analysis Action History Graph Real-time On-demand Storage Coarse-grained Taint Analysis System Call Information Process Information Record Replay

slide-30
SLIDE 30

THEIA-Panda Overview

Host THEIA-Panda Guest FA

Fine-grained Taint Analysis Action History Graph Real-time On-demand Storage Coarse-grained Taint Analysis System Call Information Process Information Record Replay

slide-31
SLIDE 31

THEIA-Panda Overview

Host THEIA-Panda Guest FA

Fine-grained Taint Analysis Action History Graph Real-time On-demand Storage Coarse-grained Taint Analysis System Call Information Process Information Record Replay

slide-32
SLIDE 32

THEIA-Panda Overview

Host THEIA-Panda Guest FA

Fine-grained Taint Analysis Action History Graph Real-time On-demand Storage Coarse-grained Taint Analysis System Call Information Process Information Record Replay

slide-33
SLIDE 33

THEIA-Panda Overview

Host THEIA-Panda Guest FA

Fine-grained Taint Analysis Action History Graph Real-time On-demand Storage Coarse-grained Taint Analysis System Call Information Process Information Record Replay

slide-34
SLIDE 34

Record and Replay

  • Record:

– Take a snapshot of the machine state – Log non-deterministic inputs

  • Data entering CPU on port input
  • Hardware interrupts and their parameters
  • Data written to RAM during direct memory operation from peripheral
  • Replay:

– Replay activity (data) starting from snapshot of machine state

  • Implementation:

– QEMU/PANDA* and 64-bit Linux Guest

*B. Dolan-Gavitt, J. Hodosh, P. Hulin, T. Leek, R. Whelan. Repeatable Reverse Engineering with PANDA. 5th Program Protection and Reverse Engineering Workshop, Los Angeles, California, December 2015

slide-35
SLIDE 35

Record and Replay

  • Record:

– Take a snapshot of the machine state – Log non-deterministic inputs

  • Data entering CPU on port input
  • Hardware interrupts and their parameters
  • Data written to RAM during direct memory operation from peripheral
  • Replay:

– Replay activity (data) starting from snapshot of machine state

  • Implementation:

– QEMU/PANDA* and 64-bit Linux Guest

*B. Dolan-Gavitt, J. Hodosh, P. Hulin, T. Leek, R. Whelan. Repeatable Reverse Engineering with PANDA. 5th Program Protection and Reverse Engineering Workshop, Los Angeles, California, December 2015

slide-36
SLIDE 36

Record and Replay

  • Record:

– Take a snapshot of the machine state – Log non-deterministic inputs

  • Data entering CPU on port input
  • Hardware interrupts and their parameters
  • Data written to RAM during direct memory operation from peripheral
  • Replay:

– Replay activity (data) starting from snapshot of machine state

  • Implementation:

– QEMU/PANDA* and 64-bit Linux Guest

*B. Dolan-Gavitt, J. Hodosh, P. Hulin, T. Leek, R. Whelan. Repeatable Reverse Engineering with PANDA. 5th Program Protection and Reverse Engineering Workshop, Los Angeles, California, December 2015

slide-37
SLIDE 37

Record and Replay Implementation
 Example

static ssize_t e1000_receive(VLANClientState *nc, const uint8_t *buf, size_t size) { do { rr_record_handle_packet_call( RR_CALLSITE_E1000_RECEIVE_2, (void *)( buf + desc_offset + vlan_offset), copy_size, NET_TRANSFER_IOB_TO_RAM) } while (desc_offset < total_size); } … … pci_dma_write(&s->dev, le64_to_cpu(desc.buffer_addr), (void *)(buf + desc_offset + vlan_offset), copy_size); … …

slide-38
SLIDE 38

Record and Replay Implementation
 Example

static ssize_t e1000_receive(VLANClientState *nc, const uint8_t *buf, size_t size) { do { rr_record_handle_packet_call( RR_CALLSITE_E1000_RECEIVE_2, (void *)( buf + desc_offset + vlan_offset), copy_size, NET_TRANSFER_IOB_TO_RAM) } while (desc_offset < total_size); } … … pci_dma_write(&s->dev, le64_to_cpu(desc.buffer_addr), (void *)(buf + desc_offset + vlan_offset), copy_size); … …

slide-39
SLIDE 39

Record and Replay Implementation
 Example

static ssize_t e1000_receive(VLANClientState *nc, const uint8_t *buf, size_t size) { do { rr_record_handle_packet_call( RR_CALLSITE_E1000_RECEIVE_2, (void *)( buf + desc_offset + vlan_offset), copy_size, NET_TRANSFER_IOB_TO_RAM) } while (desc_offset < total_size); } … … pci_dma_write(&s->dev, le64_to_cpu(desc.buffer_addr), (void *)(buf + desc_offset + vlan_offset), copy_size); … …

slide-40
SLIDE 40

OS-level Transparency

  • Goal:

– Capture events and dependencies of OS-level events

  • Approach:

– Based on VM introspection

  • Events analyzed:

– Process operations:

  • clone, fork, execve, exit, etc.

– File operations:

  • pen, read, write, unlink, etc.

– Network operations:

  • socket, connect, recvmsg, etc.

– Memory operations:

  • mmap, mprotect, shmget, etc.
slide-41
SLIDE 41

OS-level Transparency

  • Goal:

– Capture events and dependencies of OS-level events

  • Approach:

– Based on VM introspection

  • Events analyzed:

– Process operations:

  • clone, fork, execve, exit, etc.

– File operations:

  • pen, read, write, unlink, etc.

– Network operations:

  • socket, connect, recvmsg, etc.

– Memory operations:

  • mmap, mprotect, shmget, etc.
slide-42
SLIDE 42

OS-level Transparency

  • Goal:

– Capture events and dependencies of OS-level events

  • Approach:

– Based on VM introspection

  • Events analyzed:

– Process operations:

  • clone, fork, execve, exit, etc.

– File operations:

  • pen, read, write, unlink, etc.

– Network operations:

  • socket, connect, recvmsg, etc.

– Memory operations:

  • mmap, mprotect, shmget, etc.
slide-43
SLIDE 43

OS-level Transparency Implementation Example

#ifdef TARGET_X86_64 void helper_syscall(int next_eip_addend { panda_cb_list *plist; for(plist = panda_cbs[PANDA_CB_BEFORE_SYSCALL]; plist != NULL; plist = panda_cb_list_next(plist)) { plist->entry.before_syscall(env); } … }

slide-44
SLIDE 44

OS-level Transparency Implementation Example

#ifdef TARGET_X86_64 void helper_syscall(int next_eip_addend { panda_cb_list *plist; for(plist = panda_cbs[PANDA_CB_BEFORE_SYSCALL]; plist != NULL; plist = panda_cb_list_next(plist)) { plist->entry.before_syscall(env); } … }

slide-45
SLIDE 45

OS-level Transparency Implementation Example

#ifdef TARGET_X86_64 void helper_syscall(int next_eip_addend { panda_cb_list *plist; for(plist = panda_cbs[PANDA_CB_BEFORE_SYSCALL]; plist != NULL; plist = panda_cb_list_next(plist)) { plist->entry.before_syscall(env); } … }

slide-46
SLIDE 46

Action History Graph (AHG)

  • Goal:

– Represent causality across events

  • Causality:

– Process->Process (e.g., fork) – Process->File (e.g., write) – File->Process (e.g., read) – Process->Host (e.g., send) – Host->Process (e.g., recv)

slide-47
SLIDE 47

Action History Graph (AHG)

  • Goal:

– Represent causality across events

  • Causality:

– Process->Process (e.g., fork) – Process->File (e.g., write) – File->Process (e.g., read) – Process->Host (e.g., send) – Host->Process (e.g., recv)

slide-48
SLIDE 48

Action History Graph (AHG)

  • Goal:

– Represent causality across events

  • Causality:

– Process->Process (e.g., fork) – Process->File (e.g., write) – File->Process (e.g., read) – Process->Host (e.g., send) – Host->Process (e.g., recv)

slide-49
SLIDE 49

Action History Graph Example

slide-50
SLIDE 50

Coarse-grained Taint Analysis

  • Goal:

– Quickly capture the provenance of objects in the AHG

  • Working mechanism:

– Runs while building AHG – Processes have a provenance set – Process operations:

  • fork, clone: copy provenance of parent to child process

– File and network operations

  • read, recv: associate provenance of object to process
  • write, send: associate provenance of process to object
slide-51
SLIDE 51

Coarse-grained Taint Analysis

  • Goal:

– Quickly capture the provenance of objects in the AHG

  • Working mechanism:

– Runs while building AHG – Processes have a provenance set – Process operations:

  • fork, clone: copy provenance of parent to child process

– File and network operations

  • read, recv: associate provenance of object to process
  • write, send: associate provenance of process to object
slide-52
SLIDE 52

Coarse-grained Taint Analysis

  • Goal:

– Quickly capture the provenance of objects in the AHG

  • Working mechanism:

– Runs while building AHG – Processes have a provenance set – Process operations:

  • fork, clone: copy provenance of parent to child process

– File and network operations

  • read, recv: associate provenance of object to process
  • write, send: associate provenance of process to object
slide-53
SLIDE 53

Fine-grained Taint Analysis

  • Goal:

– Accurately capture provenance of objects in the AHG

  • Working mechanism:

– Decoupled from program execution – Instruction level propagation – Taint tags at byte level granularity

  • Optimizations:

– Trace-based dynamic taint analysis

slide-54
SLIDE 54

Fine-grained Taint Analysis

  • Goal:

– Accurately capture provenance of objects in the AHG

  • Working mechanism:

– Decoupled from program execution – Instruction level propagation – Taint tags at byte level granularity

  • Optimizations:

– Trace-based dynamic taint analysis

slide-55
SLIDE 55

Fine-grained Taint Analysis

  • Goal:

– Accurately capture provenance of objects in the AHG

  • Working mechanism:

– Decoupled from program execution – Instruction level propagation – Taint tags at byte level granularity

  • Optimizations:

– Trace-based dynamic taint analysis

slide-56
SLIDE 56

Fine-grained Taint Analysis Implementation

Guest Basic Block TCG Basic Block LLVM Basic Block

slide-57
SLIDE 57

Fine-grained Taint Analysis Implementation

Guest Basic Block TCG Basic Block LLVM Basic Block

slide-58
SLIDE 58

Fine-grained Taint Analysis Implementation

Guest Basic Block TCG Basic Block LLVM Basic Block

slide-59
SLIDE 59

Fine-grained Taint Analysis Implementation

Guest Basic Block TCG Basic Block LLVM Basic Block

slide-60
SLIDE 60

Trace-based Taint Analysis

  • Objective:

– Improve performance of fine-grained taint analysis

  • Key intuition:

– Within a trace instruction sequences are executed multiple times

  • Working mechanism:

– Based on the execution trace of the system/program – Computes taint summaries for sequences of instructions – Re-use taint summaries on the trace and possible across traces

  • Implementation:

– Sequitur algorithm: recognizes a lexical structure in an execution trace and generates a grammar where terminals are instructions – Analyze grammar and reuse taint results when possible

slide-61
SLIDE 61

Trace-based Taint Analysis

  • Objective:

– Improve performance of fine-grained taint analysis

  • Key intuition:

– Within a trace instruction sequences are executed multiple times

  • Working mechanism:

– Based on the execution trace of the system/program – Computes taint summaries for sequences of instructions – Re-use taint summaries on the trace and possible across traces

  • Implementation:

– Sequitur algorithm: recognizes a lexical structure in an execution trace and generates a grammar where terminals are instructions – Analyze grammar and reuse taint results when possible

slide-62
SLIDE 62

Trace-based Taint Analysis

  • Objective:

– Improve performance of fine-grained taint analysis

  • Key intuition:

– Within a trace instruction sequences are executed multiple times

  • Working mechanism:

– Based on the execution trace of the system/program – Computes taint summaries for sequences of instructions – Re-use taint summaries on the trace and possible across traces

  • Implementation:

– Sequitur algorithm: recognizes a lexical structure in an execution trace and generates a grammar where terminals are instructions – Analyze grammar and reuse taint results when possible

slide-63
SLIDE 63

Trace-based Taint Analysis Example

9

… mov qword ptr [r12+rax*8], rdx jmp 0x7f8c47a21b13 add rdx, 0x10 mov rax, qword ptr [rdx] test rax, rax jz 0x7f8c47a21b52 cmp rax, 0x21 jbe 0x7f8c47a21b08 lea rcx, ptr [rip+0x21ef29] …

Execution Trace Grammar

mov qword ptr [r12+rax*8], rdx jump 0x7f8c47a21b13

10

jz 0x7f8c47a21b52

476 8

add rdx, 0x10

43

mov rax, qword ptr [rdx]

test rax, rax

11 11

slide-64
SLIDE 64

Trace-based Taint Analysis Example

9

… mov qword ptr [r12+rax*8], rdx jmp 0x7f8c47a21b13 add rdx, 0x10 mov rax, qword ptr [rdx] test rax, rax jz 0x7f8c47a21b52 cmp rax, 0x21 jbe 0x7f8c47a21b08 lea rcx, ptr [rip+0x21ef29] …

Execution Trace Grammar

mov qword ptr [r12+rax*8], rdx jump 0x7f8c47a21b13

10

jz 0x7f8c47a21b52

476 8

add rdx, 0x10

43

mov rax, qword ptr [rdx]

11 11

slide-65
SLIDE 65

Fine-grained Taint Analysis

slide-66
SLIDE 66

Fine-grained Taint Analysis

slide-67
SLIDE 67

Case Study Overview

slide-68
SLIDE 68

Case Study Overview

slide-69
SLIDE 69

Case Study Overview

slide-70
SLIDE 70

Case Study Overview

slide-71
SLIDE 71

Case Study Overview

slide-72
SLIDE 72

Case Study and AHG

bash execute firefox firefox recv from execute 143.215.130.204 sh sh 143.215.130.204 execute wget wget recv from write screen grab execute screen grab recv msg X0 write s.png execute read nc nc write 143.215.130.204

Process Event File Network Tag Causality

slide-73
SLIDE 73

Case Study and AHG

bash execute firefox firefox recv from execute 143.215.130.204 sh sh 143.215.130.204 execute wget wget recv from write screen grab execute screen grab recv msg X0 write s.png execute read nc nc write 143.215.130.204

Process Event File Network Tag Causality

slide-74
SLIDE 74

Case Study and AHG Step 1

1) Victim starts Firefox

bash execute firefox firefox Process Event File Network Tag

recv from recv from screen grab screen grab recv msg
slide-75
SLIDE 75

Case Study and AHG Step 2

2) Victim visits malicious.com (143.215.130.204) that runs shell process

firefox recv from execute 143.215.130.204 sh sh Process Event File Network Tag

recv from recv from screen grab screen grab recv msg
slide-76
SLIDE 76

Case Study and AHG Step 3

3) Attacker downloads and executes screengrab

sh 143.215.130.204 execute wget wget recv from write screen grab execute screen grab recv msg X0 write s.png Process Event File Network Tag

recv from recv from screen grab screen grab recv msg
slide-77
SLIDE 77

Case Study and AHG Step 4

4) Screenshot is sent to attacker’s server

sh execute read s.png nc nc write 143.215.130.204 Process Event File Network Tag

recv from recv from screen grab screen grab recv msg
slide-78
SLIDE 78

Case Study and
 Coarse-grained Taint Analysis.

sh 143.215.130.204 execute wget wget recv from write screen grab libssl.so read libc.so read wgetrc read Process Event File Network Tag

recv from recv from screen grab screen grab recv msg

Coarse Taint Set

slide-79
SLIDE 79

Case Study and
 Coarse-grained Taint Analysis.

sh 143.215.130.204 execute wget wget recv from write screen grab libssl.so read CT1 libc.so read wgetrc read Process Event File Network Tag

recv from recv from screen grab screen grab recv msg

Coarse Taint Set

slide-80
SLIDE 80

Case Study and
 Coarse-grained Taint Analysis.

sh 143.215.130.204 execute wget wget recv from write screen grab libssl.so read CT1 libc.so read wgetrc read Process Event File Network Tag

CT1

recv from recv from screen grab screen grab recv msg

Coarse Taint Set

slide-81
SLIDE 81

Case Study and
 Coarse-grained Taint Analysis.

sh 143.215.130.204 execute wget wget recv from write screen grab libssl.so read CT1 libc.so read CT2 wgetrc read Process Event File Network Tag

CT1

recv from recv from screen grab screen grab recv msg

Coarse Taint Set

slide-82
SLIDE 82

Case Study and
 Coarse-grained Taint Analysis.

sh 143.215.130.204 execute wget wget recv from write screen grab libssl.so read CT1 libc.so read CT2 wgetrc read Process Event File Network Tag

CT1 CT2

recv from recv from screen grab screen grab recv msg

Coarse Taint Set

slide-83
SLIDE 83

Case Study and
 Coarse-grained Taint Analysis.

sh 143.215.130.204 execute wget wget recv from write screen grab libssl.so read CT1 libc.so read CT2 wgetrc read CT3 Process Event File Network Tag

CT1 CT2

recv from recv from screen grab screen grab recv msg

Coarse Taint Set

slide-84
SLIDE 84

Case Study and
 Coarse-grained Taint Analysis.

sh 143.215.130.204 execute wget wget recv from write screen grab libssl.so read CT1 libc.so read CT2 wgetrc read CT3 Process Event File Network Tag

CT1 CT2 CT3

recv from recv from screen grab screen grab recv msg

Coarse Taint Set

slide-85
SLIDE 85

Case Study and
 Coarse-grained Taint Analysis.

sh 143.215.130.204 execute wget wget recv from write screen grab libssl.so read CT1 libc.so read CT2 wgetrc read CT3 CT4 Process Event File Network Tag

CT1 CT2 CT3

recv from recv from screen grab screen grab recv msg

Coarse Taint Set

slide-86
SLIDE 86

Case Study and
 Coarse-grained Taint Analysis.

sh 143.215.130.204 execute wget wget recv from write screen grab libssl.so read CT1 libc.so read CT2 wgetrc read CT3 CT4 Process Event File Network Tag

CT1 CT2 CT3 CT4

recv from recv from screen grab screen grab recv msg

Coarse Taint Set

slide-87
SLIDE 87

Case Study and
 Coarse-grained Taint Analysis.

sh 143.215.130.204 execute wget wget recv from write screen grab libssl.so read CT1 libc.so read CT2 wgetrc read CT3 CT4 CT5 Process Event File Network Tag

CT1 CT2 CT3 CT4

recv from recv from screen grab screen grab recv msg

Coarse Taint Set

slide-88
SLIDE 88

Case Study and
 Fine-grained Taint Analysis

sh 143.215.130.204 execute wget wget recv from write screen grab libssl.so read libc.so read wgetrc read Process Event File Network Tag

recv from recv from screen grab screen grab recv msg
slide-89
SLIDE 89

Case Study and
 Fine-grained Taint Analysis

sh 143.215.130.204 execute wget wget recv from write screen grab libssl.so read FT1 libc.so read wgetrc read Process Event File Network Tag

recv from recv from screen grab screen grab recv msg
slide-90
SLIDE 90

Case Study and
 Fine-grained Taint Analysis

sh 143.215.130.204 execute wget wget recv from write screen grab libssl.so read FT1 libc.so read FT2 wgetrc read Process Event File Network Tag

recv from recv from screen grab screen grab recv msg
slide-91
SLIDE 91

Case Study and
 Fine-grained Taint Analysis

sh 143.215.130.204 execute wget wget recv from write screen grab libssl.so read FT1 libc.so read FT2 wgetrc read FT3 Process Event File Network Tag

recv from recv from screen grab screen grab recv msg
slide-92
SLIDE 92

Case Study and
 Fine-grained Taint Analysis

sh 143.215.130.204 execute wget wget recv from write screen grab libssl.so read FT1 libc.so read FT2 wgetrc read FT3 FT4 Process Event File Network Tag

recv from recv from screen grab screen grab recv msg
slide-93
SLIDE 93

Case Study and
 Fine-grained Taint Analysis

sh 143.215.130.204 execute wget wget recv from write screen grab libssl.so read FT1 libc.so read FT2 wgetrc read FT3 FT4 FT5 Process Event File Network Tag

recv from recv from screen grab screen grab recv msg
slide-94
SLIDE 94

THEIA-Panda Overheads

TIME Bare Exec Time KVM Exec Time QEMU Exec Time Record Exec Time Replay Exec Time Bare Exec Time KVM Exec Time 2.09 x QEMU Exec Time 6.19 x 2.96 x Record Exec Time 7.75 x 3.71 x 1.25 x Replay Exec Time 13.82 x 6.62 x 2.23 x 1.78 x

  • Fine grained taint analysis:

– ~40x to ~300x compared to bare execution

  • Space overhead:

– ~86 GB/day non det log data + ~1.3GB/day graph data

slide-95
SLIDE 95

THEIA-Panda Overheads

TIME Bare Exec Time KVM Exec Time QEMU Exec Time Record Exec Time Replay Exec Time Bare Exec Time KVM Exec Time 2.09 x QEMU Exec Time 6.19 x 2.96 x Record Exec Time 7.75 x 3.71 x 1.25 x Replay Exec Time 13.82 x 6.62 x 2.23 x 1.78 x

  • Fine grained taint analysis:

– ~40x to ~300x compared to bare execution

  • Space overhead:

– ~86 GB/day non det log data + ~1.3GB/day graph data

slide-96
SLIDE 96

THEIA-Panda Overheads

TIME Bare Exec Time KVM Exec Time QEMU Exec Time Record Exec Time Replay Exec Time Bare Exec Time KVM Exec Time 2.09 x QEMU Exec Time 6.19 x 2.96 x Record Exec Time 7.75 x 3.71 x 1.25 x Replay Exec Time 13.82 x 6.62 x 2.23 x 1.78 x

  • Fine grained taint analysis:

– ~40x to ~300x compared to bare execution

  • Space overhead:

– ~86 GB/day non det log data + ~1.3GB/day graph data

slide-97
SLIDE 97

THEIA-Panda Overheads

TIME Bare Exec Time KVM Exec Time QEMU Exec Time Record Exec Time Replay Exec Time Bare Exec Time KVM Exec Time 2.09 x QEMU Exec Time 6.19 x 2.96 x Record Exec Time 7.75 x 3.71 x 1.25 x Replay Exec Time 13.82 x 6.62 x 2.23 x 1.78 x

  • Fine grained taint analysis:

– ~40x to ~300x compared to bare execution

  • Space overhead:

– ~86 GB/day non det log data + ~1.3GB/day graph data

slide-98
SLIDE 98

THEIA-Panda Overheads

TIME Bare Exec Time KVM Exec Time QEMU Exec Time Record Exec Time Replay Exec Time Bare Exec Time KVM Exec Time 2.09 x QEMU Exec Time 6.19 x 2.96 x Record Exec Time 7.75 x 3.71 x 1.25 x Replay Exec Time 13.82 x 6.62 x 2.23 x 1.78 x

  • Fine grained taint analysis:

– ~40x to ~300x compared to bare execution

  • Space overhead:

– ~86 GB/day non det log data + ~1.3GB/day graph data

slide-99
SLIDE 99

THEIA-Panda Observations

  • Panda
slide-100
SLIDE 100

THEIA-KI Overview

THEIA-KI-Analysis FA

Fine-grained Taint Analysis Action History Graph Real-time On-demand Storage Query Interface

THEIA-KI + OS

Record Replay System Call Information Process Information

slide-101
SLIDE 101

THEIA-KI Overview

THEIA-KI-Analysis FA

Fine-grained Taint Analysis Action History Graph Real-time On-demand Storage Query Interface

THEIA-KI + OS

Record Replay System Call Information Process Information

slide-102
SLIDE 102

THEIA-KI Overview

THEIA-KI-Analysis FA

Fine-grained Taint Analysis Action History Graph Real-time On-demand Storage Query Interface

THEIA-KI + OS

Record Replay System Call Information Process Information

slide-103
SLIDE 103

THEIA-KI Overview

THEIA-KI-Analysis FA

Fine-grained Taint Analysis Action History Graph Real-time On-demand Storage Query Interface

THEIA-KI + OS

Record Replay System Call Information Process Information

slide-104
SLIDE 104

THEIA-KI Overview

THEIA-KI-Analysis FA

Fine-grained Taint Analysis Action History Graph Real-time On-demand Storage Query Interface

THEIA-KI + OS

Record Replay System Call Information Process Information

slide-105
SLIDE 105

THEIA-KI Overview

THEIA-KI-Analysis FA

Fine-grained Taint Analysis Action History Graph Real-time On-demand Storage Query Interface

THEIA-KI + OS

Record Replay System Call Information Process Information

slide-106
SLIDE 106

THEIA-KI Overview

THEIA-KI-Analysis FA

Fine-grained Taint Analysis Action History Graph Real-time On-demand Storage Query Interface

THEIA-KI + OS

Record Replay System Call Information Process Information

slide-107
SLIDE 107

THEIA-KI

  • Key features:

– Record/replay

  • Kernel-based instrumentation

– Instruction level replay of the user space

  • On top of Intel PIN

– Coarse-grained causality

  • From system instrumentation and logging

– Fine-grained causality

  • From dynamic taint tracking
  • Threat model:

– Kernel is trusted

slide-108
SLIDE 108

THEIA-KI

  • Key features:

– Record/replay

  • Kernel-based instrumentation

– Instruction level replay of the user space

  • On top of Intel PIN

– Coarse-grained causality

  • From system instrumentation and logging

– Fine-grained causality

  • From dynamic taint tracking
  • Threat model:

– Kernel is trusted

slide-109
SLIDE 109

THEIA-KI

  • Key features:

– Record/replay

  • Kernel-based instrumentation

– Instruction level replay of the user space

  • On top of Intel PIN

– Coarse-grained causality

  • From system instrumentation and logging

– Fine-grained causality

  • From dynamic taint tracking
  • Threat model:

– Kernel is trusted

slide-110
SLIDE 110

Record and Replay

  • Record:

– Kernel instrumentation

  • Order, return values and memory addresses modified by a system call
  • Timing and values of received signals
  • Sources of randomness

– Libc instrumentation

  • synchronization of pthread
  • Implementation:

– Arnold* with 32-bit Linux kernel

Process group

Thread 1 Thread 2

*David Devecsery, Michael Chow, Xianzheng Dou, Peter M Chen, Jason Flinn. Eidetic Systems. Proceedings of the 11th USENIX Symposium on Operating System Design and Implementation (OSDI), October 2014.

slide-111
SLIDE 111

Record and Replay

  • Record:

– Kernel instrumentation

  • Order, return values and memory addresses modified by a system call
  • Timing and values of received signals
  • Sources of randomness

– Libc instrumentation

  • synchronization of pthread
  • Implementation:

– Arnold* with 32-bit Linux kernel

Process group

Thread 1 Thread 2

*David Devecsery, Michael Chow, Xianzheng Dou, Peter M Chen, Jason Flinn. Eidetic Systems. Proceedings of the 11th USENIX Symposium on Operating System Design and Implementation (OSDI), October 2014.

slide-112
SLIDE 112

Record and Replay

  • Record:

– Kernel instrumentation

  • Order, return values and memory addresses modified by a system call
  • Timing and values of received signals
  • Sources of randomness

– Libc instrumentation

  • synchronization of pthread
  • Implementation:

– Arnold* with 32-bit Linux kernel

File Socket Randomness

External Inputs

Process group

Thread 1 Thread 2

*David Devecsery, Michael Chow, Xianzheng Dou, Peter M Chen, Jason Flinn. Eidetic Systems. Proceedings of the 11th USENIX Symposium on Operating System Design and Implementation (OSDI), October 2014.

slide-113
SLIDE 113

Record and Replay

  • Record:

– Kernel instrumentation

  • Order, return values and memory addresses modified by a system call
  • Timing and values of received signals
  • Sources of randomness

– Libc instrumentation

  • synchronization of pthread
  • Implementation:

– Arnold* with 32-bit Linux kernel

File Socket Randomness

External Inputs

Process group

Thread 1 Thread 2

Thread Synchronization

*David Devecsery, Michael Chow, Xianzheng Dou, Peter M Chen, Jason Flinn. Eidetic Systems. Proceedings of the 11th USENIX Symposium on Operating System Design and Implementation (OSDI), October 2014.

slide-114
SLIDE 114

Record and Replay

  • Record:

– Kernel instrumentation

  • Order, return values and memory addresses modified by a system call
  • Timing and values of received signals
  • Sources of randomness

– Libc instrumentation

  • synchronization of pthread
  • Implementation:

– Arnold* with 32-bit Linux kernel

File Socket Randomness

External Inputs

Process group

Thread 1 Thread 2

Thread Synchronization

*David Devecsery, Michael Chow, Xianzheng Dou, Peter M Chen, Jason Flinn. Eidetic Systems. Proceedings of the 11th USENIX Symposium on Operating System Design and Implementation (OSDI), October 2014.

slide-115
SLIDE 115

Kernel Instrumentation
 Implementation Example

unsigned long arch_align_stack(unsigned long sp { /* Begin REPLAY */ if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space){ unsigned int rand = get_random_int(); if (current->record_thrd) { record_randomness(rand); } else if (current->replay_thrd){ rand = replay_randomness(); } sp -= rand % 8192; } /* End REPLAY */ return sp & ~0xf; }

slide-116
SLIDE 116

Kernel Instrumentation
 Implementation Example

unsigned long arch_align_stack(unsigned long sp { /* Begin REPLAY */ if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space){ unsigned int rand = get_random_int(); if (current->record_thrd) { record_randomness(rand); } else if (current->replay_thrd){ rand = replay_randomness(); } sp -= rand % 8192; } /* End REPLAY */ return sp & ~0xf; }

slide-117
SLIDE 117

Kernel Instrumentation
 Implementation Example

unsigned long arch_align_stack(unsigned long sp { /* Begin REPLAY */ if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space){ unsigned int rand = get_random_int(); if (current->record_thrd) { record_randomness(rand); } else if (current->replay_thrd){ rand = replay_randomness(); } sp -= rand % 8192; } /* End REPLAY */ return sp & ~0xf; }

slide-118
SLIDE 118

Query System Workflow

Reachability & Pruning Coarse-grained Subgraph Fine-grained analysis Fine-grained Tags Triggering Points AHG Queries

slide-119
SLIDE 119

Query System Workflow

Reachability & Pruning Coarse-grained Subgraph Fine-grained analysis Fine-grained Tags Triggering Points AHG Queries

slide-120
SLIDE 120

Query System Workflow

Reachability & Pruning Coarse-grained Subgraph Fine-grained analysis Fine-grained Tags Triggering Points AHG Queries

slide-121
SLIDE 121

Query System Workflow

Reachability & Pruning Coarse-grained Subgraph Fine-grained analysis Fine-grained Tags Triggering Points AHG Queries

slide-122
SLIDE 122

Query System Workflow

Reachability & Pruning Coarse-grained Subgraph Fine-grained analysis Fine-grained Tags Triggering Points AHG Queries

slide-123
SLIDE 123

Triggering Points and Queries

  • Triggering points:

– Pre-defined policies

  • Process writes to /etc/passwd
  • Queries:

– From automated forensic analysis systems – Human based analysis

  • Analysis types:

– Backward:

  • Where does this object come from?

– Forward:

  • What is the impact of this object on the system?

– Point-to-point:

  • Are these two objects related?
slide-124
SLIDE 124

Triggering Points and Queries

  • Triggering points:

– Pre-defined policies

  • Process writes to /etc/passwd
  • Queries:

– From automated forensic analysis systems – Human based analysis

  • Analysis types:

– Backward:

  • Where does this object come from?

– Forward:

  • What is the impact of this object on the system?

– Point-to-point:

  • Are these two objects related?
slide-125
SLIDE 125

Triggering Points and Queries

  • Triggering points:

– Pre-defined policies

  • Process writes to /etc/passwd
  • Queries:

– From automated forensic analysis systems – Human based analysis

  • Analysis types:

– Backward:

  • Where does this object come from?

– Forward:

  • What is the impact of this object on the system?

– Point-to-point:

  • Are these two objects related?
slide-126
SLIDE 126

Point-to-point Query Example

  • 1. Attacker tampers contract file ctct.csv
  • 2. Employee creates seasonal report s1.csv using spreadsheet editor
  • 3. Auto report program sends seasonal s1.csv report to archive server
  • 4. Employee creates seasonal report s2.csv using spreadsheet editor
  • 5. Template generator creates template t.doc
  • 6. Employee creates half-year report h2.pdf using document editor
slide-127
SLIDE 127

Point-to-point Query Example

  • 1. Attacker tampers contract file ctct.csv
  • 2. Employee creates seasonal report s1.csv using spreadsheet editor
  • 3. Auto report program sends seasonal s1.csv report to archive server
  • 4. Employee creates seasonal report s2.csv using spreadsheet editor
  • 5. Template generator creates template t.doc
  • 6. Employee creates half-year report h2.pdf using document editor
slide-128
SLIDE 128

Point-to-point Query Example

  • 1. Attacker tampers contract file ctct.csv
  • 2. Employee creates seasonal report s1.csv using spreadsheet editor
  • 3. Auto report program sends seasonal s1.csv report to archive server
  • 4. Employee creates seasonal report s2.csv using spreadsheet editor
  • 5. Template generator creates template t.doc
  • 6. Employee creates half-year report h2.pdf using document editor

ctct.csv Spreadsheet Editor read write s1.csv Template Generator t.doc write Document Editor read read h2.pdf write s2.csv write Spreadsheet Editor read Auto Report archive server read send

slide-129
SLIDE 129

Forward Reachability

  • 1. Attacker tampers contract file ctct.csv
  • 2. Employee creates seasonal report s1.csv using spreadsheet editor
  • 3. Auto report program sends seasonal s1.csv report to archive server
  • 4. Employee creates seasonal report s2.csv using spreadsheet editor
  • 5. Template generator creates template t.doc
  • 6. Employee creates half-year report h2.pdf using document editor

ctct.csv Spreadsheet Editor read write s1.csv Template Generator t.doc write Document Editor read read h2.pdf write s2.csv write Spreadsheet Editor read Auto Report archive server read send

slide-130
SLIDE 130

Backward Reachability

  • 1. Attacker tampers contract file ctct.csv
  • 2. Employee creates seasonal report s1.csv using spreadsheet editor
  • 3. Auto report program sends seasonal s1.csv report to archive server
  • 4. Employee creates seasonal report s2.csv using spreadsheet editor
  • 5. Template generator creates template t.doc
  • 6. Employee creates half-year report h2.pdf using document editor

ctct.csv Spreadsheet Editor read write s1.csv Template Generator t.doc write Document Editor read read h2.pdf write s2.csv write Spreadsheet Editor read Auto Report archive server read send

slide-131
SLIDE 131

Reachability Result

  • 1. Attacker tampers contract file ctct.csv
  • 2. Employee creates seasonal report s1.csv using spreadsheet editor
  • 3. Auto report program sends seasonal s1.csv report to archive server
  • 4. Employee creates seasonal report s2.csv using spreadsheet editor
  • 5. Template generator creates template t.doc
  • 6. Employee creates half-year report h2.pdf using document editor

ctct.csv Spreadsheet Editor read write s1.csv Template Generator t.doc write Document Editor read read h2.pdf write s2.csv write Spreadsheet Editor read Auto Report archive server read send

slide-132
SLIDE 132

Runtime Overhead: SPEC CPU2006

3.22%

slide-133
SLIDE 133

Runtime Overhead: I/O Operations

<50%

slide-134
SLIDE 134

Pruning Efficiency

~94.2% reduction

None RAIN

slide-135
SLIDE 135

Information Flow Tracking Accuracy

~94.2% reduction

Coarse-level Fine-level

slide-136
SLIDE 136

Storage Cost

~4GB per day

slide-137
SLIDE 137

Future Work

  • Hypervisor-based non-emulation R/R
  • Differential Taint Analysis
  • Running memory sanitizers on replay
  • Multi-host support
  • Porting from 32-bit to 64-bit
slide-138
SLIDE 138

Future Work

  • Hypervisor-based non-emulation R/R
  • Differential Taint Analysis
  • Running memory sanitizers on replay
  • Multi-host support
  • Porting from 32-bit to 64-bit
slide-139
SLIDE 139

Future Work

  • Hypervisor-based non-emulation R/R
  • Differential Taint Analysis
  • Running memory sanitizers on replay
  • Multi-host support
  • Porting from 32-bit to 64-bit
slide-140
SLIDE 140

Conclusion

slide-141
SLIDE 141

Conclusion

slide-142
SLIDE 142

Conclusion

slide-143
SLIDE 143

Conclusion

slide-144
SLIDE 144

APT Demo

slide-145
SLIDE 145

APT Demo

slide-146
SLIDE 146

APT Demo

slide-147
SLIDE 147

THEIA-Panda Demo

slide-148
SLIDE 148

THEIA-Panda Demo

slide-149
SLIDE 149

THEIA-Panda Demo